Jump to content

DECLINED - [FEATURE] Provide API access to My Finds PQ


W8TTS

Recommended Posts

First, I ask that you please read this pinned topic and try to follow the conventions outlined there. It will help us maintain this forum and better respond to topics.

 

Placing the request for the data in the "My Finds" PQ into an API operation separate from the requests for other PQs does not address the main reasons that the "My Finds" PQ is not available. It would still bring mobile devices to their knees, for example.

 

Could you explain the need for API access to the "My Finds" PQ? If I could understand your reasoning for it better I might be able to argue your points to the dev team. As it is, it seems like the needed functionality is better addressed by other API calls that are not as hard on the server.

Link to post

Could you explain the need for API access to the "My Finds" PQ?

I run a PQ and I can download the results via an API call. I run a MyFinds PQ and I can only download it by going to the PQ page and downloading it. Could you explain how this is easier on the servers? Seems to me when I run the MyFinds PQ whether I download it manually or via and API call the load generating the PQ is the same. Is the API download that much higher of a load than the manual download? It seems to me, unless I totally misunderstand, that the rules for running a PQ remain in place and the MyFinds is still limited to once every three days. So I am not understanding the server load issue. I would understand the issue much better if Almogul could generate and download a MyFinds several times a day via API. I don't understand why the downloading of the MyFinds PQ is restricted to the manual process only. Your not regenerating those things every time I request a download, are you? If so I understand the issue and think you need to look at caching a file from the most recent run.

 

As for downloading a MyFinds PQ to a phone, if the turkey wants to do that, let them do it. I suspect the first time it is done will be the last time it is done. Of course if they have less than a 100 finds they might not fully understand until they get several thousand. The extension to saying we are protecting the mobile devices would be that you can't download it at all because someone might access it with a 9600 baud dial up line.

Edited by jholly
Link to post

Could you explain the need for API access to the "My Finds" PQ?

I run a PQ and I can download the results via an API call. I run a MyFinds PQ and I can only download it by going to the PQ page and downloading it. Could you explain how this is easier on the servers? Seems to me when I run the MyFinds PQ whether I download it manually or via and API call the load generating the PQ is the same. Is the API download that much higher of a load than the manual download? It seems to me, unless I totally misunderstand, that the rules for running a PQ remain in place and the MyFinds is still limited to once every three days. So I am not understanding the server load issue. I would understand the issue much better if Almogul could generate and download a MyFinds several times a day via API. I don't understand why the downloading of the MyFinds PQ is restricted to the manual process only. Your not regenerating those things every time I request a download, are you? If so I understand the issue and think you need to look at caching a file from the most recent run.

 

As for downloading a MyFinds PQ to a phone, if the turkey wants to do that, let them do it. I suspect the first time it is done will be the last time it is done. Of course if they have less than a 100 finds they might not fully understand until they get several thousand. The extension to saying we are protecting the mobile devices would be that you can't download it at all because someone might access it with a 9600 baud dial up line.

Ah, but the MyFinds PQ is limited to run every three days.

Link to post

And, an API could be set up be run only every 3 days.

 

As usual it was declined, as was my first request on the old Forum, because of downloading to a smart phone. As others have said, they would only do it once. If they have a few finds no problem, but hit a few thousand and they will see it's a problem and probably not do it again.

 

Having two different was to do PQ's is counter productive. If not an API then include the MyFinds PQ in the "Download pocket queries" API. Yea, I know someone will down load it to a phone, and they will quickly lean not to do that.

Link to post

Looking at the number of players listed by cacherstats.com and the number of player accounts that Groundspeak claims leads me to believe just about all the users have less than 200 finds. This would imply that the majority of the users would have a MyFinds PQ of less than 200 caches. But yet these same users can generate a PQ of 1000 waypoints. Seems to me that the normal PQ's can exceed the MyFinds by a factor of 5. So what are you doing to protect these smartphone users from creating and downloading a 1000 cache PQ that is bigger than their MyFinds PQ? There is something missing in the explanations of why we can not download a MyFinds via API.

Link to post

Could you explain the need for API access to the "My Finds" PQ?

I run a PQ and I can download the results via an API call. I run a MyFinds PQ and I can only download it by going to the PQ page and downloading it. Could you explain how this is easier on the servers? Seems to me when I run the MyFinds PQ whether I download it manually or via and API call the load generating the PQ is the same. Is the API download that much higher of a load than the manual download? It seems to me, unless I totally misunderstand, that the rules for running a PQ remain in place and the MyFinds is still limited to once every three days. So I am not understanding the server load issue. I would understand the issue much better if Almogul could generate and download a MyFinds several times a day via API. I don't understand why the downloading of the MyFinds PQ is restricted to the manual process only. Your not regenerating those things every time I request a download, are you? If so I understand the issue and think you need to look at caching a file from the most recent run.

 

As for downloading a MyFinds PQ to a phone, if the turkey wants to do that, let them do it. I suspect the first time it is done will be the last time it is done. Of course if they have less than a 100 finds they might not fully understand until they get several thousand. The extension to saying we are protecting the mobile devices would be that you can't download it at all because someone might access it with a 9600 baud dial up line.

Ah, but the MyFinds PQ is limited to run every three days.

I said that.

Link to post
Could you explain the need for API access to the "My Finds" PQ? If I could understand your reasoning for it better I might be able to argue your points to the dev team. As it is, it seems like the needed functionality is better addressed by other API calls that are not as hard on the server.

 

Think outside the "mobile device" box. Pages like http://www.mygeocachingprofile.com/ could be set up to automatically request and pull the PQ for each user, instead of requiring the user to manually click that button, download the PQ and upload it again each and every time.

Link to post

Think outside the "mobile device" box. Pages like http://www.mygeocachingprofile.com/ could be set up to automatically request and pull the PQ for each user, instead of requiring the user to manually click that button, download the PQ and upload it again each and every time.

 

The API has the functionality to return a list of caches found by a given account. Allowing users to access that to download their "My Finds" on all devices is a separate issue.

Link to post

jholly, you are misunderstanding how the API works. It is not some sort of FTP pipe that pulls down saved GPX files to your device. It uses the list of your downloadable PQs to determine which ones you have generated, but then uses the cache IDs in each to pull the cache data out of the database. That data may or may not be formatted as GPX coming in depending on the application pulling in the data.

 

I still have not seen a reason why the "My Finds" PQ is needed (at least at this time) in the API.

Link to post

Yes, I was not understanding how the PQ's worked. So if I understand it correctly running a PQ just generates a list of caches for that PQ. I then get to pick which list I am going to get the data from. When I go to download the PQ data, either by manually clicking the download button or the API interface, that is the time you actually generate the data stream. But that still begs the question of the load. I can only generate the MyFinds PQ once every three days, but I can still download it multiple times during that three days. Doesn't that cause server load? Is the load that much higher for an API download versus a manual download? Or is it a bug that I can download the MyFinds multiple times in the three day period?

 

As for why, say I want to do some processing with a GSAK macro and the macro would be a lot smoother if I could download the MyFinds via API instead of clicking the download button. The same reason why it is nice to download the regular PQ's via API. As for mobile device users, my position is that the vast majority can create a regular PQ that is far larger than their MyFinds PQ and they can still download the regular PQ.

Link to post

You are still confusing PQs with the API. They are two separate things entirely. The PQ generator that packages up your GPX files and mails them/provides them for download is a feature that had been around for years and does not use the API in any form. The PQ generator packages up a GPX file and serves it up for download; the API does not.

Link to post

Okay, being confused is normal state.

 

You're not alone. Apparently I was incorrect as well - the API does in fact access the saved PQ file, although it is not simply a copy process. We used to do it the way I described, but for various reasons (including wanting to treat a PQ as a snapshot in time), we now access the saved PQ.

 

The big reason this is so expensive for the API is that it has to crack open the zipped file into RAM on the server and then stream that data down to the requesting entity. Any PQ over ~1000 is simply prohibitively large for doing this. It's best to let users download their saved PQ and crack it open on their local machine.

Link to post

Okay, being confused is normal state.

 

You're not alone. Apparently I was incorrect as well - the API does in fact access the saved PQ file, although it is not simply a copy process. We used to do it the way I described, but for various reasons (including wanting to treat a PQ as a snapshot in time), we now access the saved PQ.

 

The big reason this is so expensive for the API is that it has to crack open the zipped file into RAM on the server and then stream that data down to the requesting entity. Any PQ over ~1000 is simply prohibitively large for doing this. It's best to let users download their saved PQ and crack it open on their local machine.

Thank you, that answers the question quite nicely.

Link to post

The big reason this is so expensive for the API is that it has to crack open the zipped file into RAM on the server and then stream that data down to the requesting entity. Any PQ over ~1000 is simply prohibitively large for doing this. It's best to let users download their saved PQ and crack it open on their local machine.

Simple solution: Provide a flag for the requesting entity to tell the API to not unzip it. That would put less load on the server and use less bandwidth. Make the my finds PQ only requestable in zipped form.

Link to post

The big reason this is so expensive for the API is that it has to crack open the zipped file into RAM on the server and then stream that data down to the requesting entity. Any PQ over ~1000 is simply prohibitively large for doing this. It's best to let users download their saved PQ and crack it open on their local machine.

Simple solution: Provide a flag for the requesting entity to tell the API to not unzip it. That would put less load on the server and use less bandwidth. Make the my finds PQ only requestable in zipped form.

 

Or make the API not use PQ files directly at all. In the "output to" section of the PQ page, you could provide an additional destination, call it API or database or whatever. Instead of writing the results to a file (which would also make the PQ undownloadable through regular means), they would be written elsewhere in the database and the API engine would then pull the data out of there, which could then even happen in chunks.

 

Of course that doesn't solve the problem with the "my finds" PQ, as there's no selectable destination...

Edited by dfx
Link to post

Or make the API not use PQ files directly at all. In the "output to" section of the PQ page, you could provide an additional destination, call it API or database or whatever. Instead of writing the results to a file (which would also make the PQ undownloadable through regular means), they would be written elsewhere in the database and the API engine would then pull the data out of there, which could then even happen in chunks.

No, that won't reduce server load. Windows has a TransmitFile function that sends files over TCP/IP with extremely little CPU overhead. Sending a file from a disk this way is one of the least expensive operations when it comes to server load.

Link to post

Any PQ over ~1000 is simply prohibitively large for doing this. It's best to let users download their saved PQ and crack it open on their local machine.

 

I am sorry if I am not getting this but what you are saying is that if I manually download the PQs by using Firefox, GC.com sends me a zip file, but, if I use GSAK to download the same queries it actually opens each one, sends me the information for each individual cache, and then recreates a zip file of those caches on my computer?

Edited by Mr Kaswa
Link to post

Or make the API not use PQ files directly at all. In the "output to" section of the PQ page, you could provide an additional destination, call it API or database or whatever. Instead of writing the results to a file (which would also make the PQ undownloadable through regular means), they would be written elsewhere in the database and the API engine would then pull the data out of there, which could then even happen in chunks.

No, that won't reduce server load. Windows has a TransmitFile function that sends files over TCP/IP with extremely little CPU overhead. Sending a file from a disk this way is one of the least expensive operations when it comes to server load.

 

Yeah it is, but I don't think this is relevant. First of all you'd have to make the specific call return raw binary data instead of the usual XML/JSON response, which already may be difficult in the existing framework. Then you'd have to make the engine actually use that function, which also may be impossible depending on how everything is set up (remember, you're dealing with sockets in a HTTP server context here).

 

Of course you're right that if you already have a ZIP file with the contents that you're interested in, then sending out the raw file (even if it's through the usual read/write loop) is the best option. But instead of generating a ZIP file first and putting that on disk, you could just skip that step. You don't package up the results at all, instead you just duplicate the raw data in the DB for later retrieval, which usually is a very fast operation. I can only imagine that getting rid of the "dump everything to a file and zip it up" step would be beneficial.

Link to post

But instead of generating a ZIP file first and putting that on disk, you could just skip that step. You don't package up the results at all, instead you just duplicate the raw data in the DB for later retrieval, which usually is a very fast operation. I can only imagine that getting rid of the "dump everything to a file and zip it up" step would be beneficial.

Except that there's already a separate PQ generator box dedicated to that. The PQ generator doesn't have to be real time. If a hundred people hit the generate My Finds button then then they're just queued up and processed over time. The API needs to respond quickly without timing out.

Link to post

But instead of generating a ZIP file first and putting that on disk, you could just skip that step. You don't package up the results at all, instead you just duplicate the raw data in the DB for later retrieval, which usually is a very fast operation. I can only imagine that getting rid of the "dump everything to a file and zip it up" step would be beneficial.

Except that there's already a separate PQ generator box dedicated to that. The PQ generator doesn't have to be real time. If a hundred people hit the generate My Finds button then then they're just queued up and processed over time. The API needs to respond quickly without timing out.

 

You're still not getting my idea. The PQ generator would still do its thing. Only that instead of taking the data, dumping it out into a file and zipping it up, it writes it right back into the database. The API would then not load a file and send that back, but read the snapshot data out of the DB and send that out (potentially only parts of it). When the PQ snapshot expires, not a file is deleted but rather the entries from the DB.

 

The API already supports filtering by various criteria in a real-time request. If there's no performance bottleneck there, then I see no reason not to extend that to either load a pre-made snapshot of cache data, or even extend the filters to include most (all?) of the currently existing PQ filters, which would make PQs obsolete. Remember, PQs were originally created for email delivery of bulk cache data. The whole concept of creating the file that was supposed to get emailed out, but not emailing it and instead offering it for download is kinda absurd.

Link to post

You're still not getting my idea. The PQ generator would still do its thing. Only that instead of taking the data, dumping it out into a file and zipping it up, it writes it right back into the database.

You're still missing my point:

 

1) Transferring a file from a disk = very low CPU/RAM usage

2) Transferring data from a database = medium to high CPU/RAM usage

Link to post

You're still not getting my idea. The PQ generator would still do its thing. Only that instead of taking the data, dumping it out into a file and zipping it up, it writes it right back into the database.

You're still missing my point:

 

1) Transferring a file from a disk = very low CPU/RAM usage

2) Transferring data from a database = medium to high CPU/RAM usage

 

Except that

 

1) you need to generate that file first = high CPU usage

2) you don't need to generate that file first = zero CPU usage.

Link to post

Except that

 

1) you need to generate that file first = high CPU usage

2) you don't need to generate that file first = zero CPU usage.

Huh? :huh: Can you please explain how not generating the file first does not use the CPU? If it's accessing the database it's using the CPU.

 

My point is that PQ generation is not time critical. If the PQ generator is busy then the PQs take longer to generate.

 

Generating the PQ and sending the file through the API are two separate things. Downloading the PQ is just downloading the PQ. It doesn't trigger the generation. That would be a new API call.

 

That's probably why downloading the My Finds through the API isn't a high priority. You have to visit the website to generate it anyways, might as well download it there too.

Link to post
Huh? :huh: Can you please explain how not generating the file first does not use the CPU? If it's accessing the database it's using the CPU.

 

Why is that so hard to understand? Formatting the result set from a PQ as XML/GPX data and then packing up the GPX into a ZIP file is an expensive operation. If you don't actually need a ZIP file in the end (which you do if you plan on emailing it out, but you certainly don't in the case of an API - it's not FTP), then it makes no sense to do that. It makes much more sense to skip that step, not format the result set as GPX, not pack up everything as ZIP file, and instead just write the raw data back to the DB. That would make it available for later use in the format that you need it in: raw bits of data.

 

In a well groomed database, pulling data out of a table that's already prepared in the way that you need it in is an operation hardly any more expensive than opening a file and reading its contents (opening the file is the tough part here: many result sets means many files, and many files means stress on the file system). The expensive operation in PQ generation is combing the complete worldwide list of caches for the result set that the user is interested in. The PQ generator would still do that whenever the PQ runs, and not when the result set is requested for download.

Link to post

Why is that so hard to understand?

I'm wondering the same thing about you. :laughing:

 

Formatting the result set from a PQ as XML/GPX data and then packing up the GPX into a ZIP file is an expensive operation. If you don't actually need a ZIP file in the end (which you do if you plan on emailing it out, but you certainly don't in the case of an API - it's not FTP), then it makes no sense to do that. It makes much more sense to skip that step, not format the result set as GPX, not pack up everything as ZIP file, and instead just write the raw data back to the DB. That would make it available for later use in the format that you need it in: raw bits of data.

So instead of formatting it as GPX you're going to format it as JSON. Still going to use the same amount of CPU time either way. So the only issue is ZIPing the data. That just becomes a tradeoff between a little CPU usage on the PQ generator and bandwidth used to send the data. It's user selectable anyways.

 

As for the API not being FTP, why not? It should be for sending files. You ask for a PQ and it should send it as a byte stream and not convert it to another form first.

 

In a well groomed database, pulling data out of a table that's already prepared in the way that you need it in is an operation hardly any more expensive than opening a file and reading its contents

Wrong! It needs to load the index into memory to find the record. Then it needs to read the db row. Since the data won't find into one DB row it will be stored out of line somewhere else in the DB. Then it needs to read the BLOB (called different things in different DB) data into RAM and copy the data portion into a send buffer. Then it needs to call the socket send function. All the while causing many kernel/userspace transitions. Lots of CPU, lots of I/O and lots of RAM.

 

To send a file is one Win32 API call. Then the kernel will read the file into it's I/O cache (which is already allocated) and tells the network card to DMA it out. Very little CPU, little bit of I/O, little bit of RAM and one userspace/kernel transition.

 

(opening the file is the tough part here: many result sets means many files, and many files means stress on the file system).

The filesystem is a simplified database anyways. You're arguing that stress on the filesystem is bad thing while ignoring the stress on the database.

 

The filesystem has been optimized to find files. A database needs to be generic to find many different types of data.

 

The expensive operation in PQ generation is combing the complete worldwide list of caches for the result set that the user is interested in.

How is that any less expensive than combing the complete worldwide list of caches for the result set to send through the API? It's not.

 

It doesn't matter what the returned data format is (GPX, JSON or binary). My point is that saving it to a file for direct download is less CPU/IO/RAM intensive than storing it in a database.

Edited by Avernar
Link to post
So instead of formatting it as GPX you're going to format it as JSON. Still going to use the same amount of CPU time either way. So the only issue is ZIPing the data. That just becomes a tradeoff between a little CPU usage on the PQ generator and bandwidth used to send the data. It's user selectable anyways.

Yup (except that downloadable PQs are always in ZIP format, but w/e). However, I don't consider creating a ZIP file as "a little CPU usage". The problem isn't even CPU usage, in fact compressing data can bring large benefits to server applications: you spend more CPU in exchange for shorter transmit times, thus freeing up other resources more quickly. Groundspeak actually already does gzip encoding on their website backend, I can only assume the API is the same. Compression is good, no question about it, the only part that matters is where you do the compression: on the web servers, of which there are many and which can scale easily, or on the PQ generator, of which there's only a small number (two? three?) and which need to dump files in a central location?

 

As for the API not being FTP, why not? It should be for sending files. You ask for a PQ and it should send it as a byte stream and not convert it to another form first.

Yeah well, it doesn't. And no, an API should not be for sending files, it should be an Application Programming Interface. :rolleyes:

 

Like I said above, there's a good chance that in the current framework for the API, a direct raw file download isn't even easily possible. But that's not even the point, the point is that the API is an API - it's meant for applications, and applications are interested in the content of the PQ's output, and not any files. Applications don't want a ZIP file, they don't even want a GPX file, they want the data, in whatever format is easiest to handle. That's why it makes no sense to have a PQ generate a ZIP file or even a GPX file first if you want the data through the API, it just needs to generate the data, and then have the API send out that data.

 

Seeing PQs as a file generator is fine if you're actually interested in getting a file. But when you're on the API, you aren't.

 

Wrong! It needs to load the index into memory to find the record. Then it needs to read the db row. Since the data won't find into one DB row it will be stored out of line somewhere else in the DB. Then it needs to read the BLOB (called different things in different DB) data into RAM and copy the data portion into a send buffer. Then it needs to call the socket send function. All the while causing many kernel/userspace transitions. Lots of CPU, lots of I/O and lots of RAM.

 

To send a file is one Win32 API call. Then the kernel will read the file into it's I/O cache (which is already allocated) and tells the network card to DMA it out. Very little CPU, little bit of I/O, little bit of RAM and one userspace/kernel transition.

...

The filesystem is a simplified database anyways. You're arguing that stress on the filesystem is bad thing while ignoring the stress on the database.

 

The filesystem has been optimized to find files. A database needs to be generic to find many different types of data.

Oh please. Yeah, a file system is made for making files accessible, and a database is made for making data available. You're not interested in getting a file, you're interested in getting the data. Therefore it makes no sense to facilitate a file system and files to transmit the data. None at all. The theoretical performance difference you have your mind set on is negligible and moot, because 1) the backend most likely isn't even able to use your favorite function call, 2) we're not talking about a bulk file download server here, 3) each PQ is usually downloaded exactly once after being generated, and most importantly 4) the application doesn't want a file anyway!

 

The expensive operation in PQ generation is combing the complete worldwide list of caches for the result set that the user is interested in.

How is that any less expensive than combing the complete worldwide list of caches for the result set to send through the API? It's not.

Correct. But you're saving the step of creating a file that the other end (the application) doesn't actually want. I don't know about you, but when it comes to resource management, I'm all for not doing something that doesn't need to be done. The theoretical performance penalty that you see in having to load data from the DB and format it for transmission on the fly isn't an issue, because that happens on the HTTP frontend, which scales easily, and is made up by taking load off the PQ generators, which have better things to do than spitting out files that will only end up getting deleted (or not even saved).

 

The API's HTTP frontend doesn't have a performance problem. It only struggles with sending out large PQs because they're freaking ZIP files that it needs to open and read! So yeah, you can solve that by skipping the "open and read" step and just send out the raw ZIP files. Or you can solve that by getting rid of the files completely and just send out the original data (like it does with all other API calls). The second option makes much more sense because we don't want any files to begin with.

Link to post

You are correct.

I just ran mine and used GSAK and the API to look for it...no show.

Perhaps this is by design due to the (potential) size of the PQ?

See post #17 of this thread. I did not pursue the line on how many cachers in the world exceed 1000 finds versus how many are requesting 1000 cache PQs.

Link to post

You are correct.

I just ran mine and used GSAK and the API to look for it...no show.

Perhaps this is by design due to the (potential) size of the PQ?

See post #17 of this thread. I did not pursue the line on how many cachers in the world exceed 1000 finds versus how many are requesting 1000 cache PQs.

 

Odd, I was the only responder to puczmeloun's topic, and now suddenly the threads have been automagically merged without warning. :o

 

In any case:

When I use the API through GSAK, I can do several things I can't do using the smartphone app. Thus I presume the app is written to be able to make certain calls, but not others. I suppose there is no way to tell a 'My Finds' PQ from a 'normal' PQ, so the app can't distinguish (and not show) 'My Finds'.

When I use GSAK, the PQ is downloaded to my hard-drive just the same as if I manually did it using my browser, and then the 'Load Files' function un-zips it.

 

Why would anyone want to look at their 'My Finds' PQ on their smartphone? :blink:

Link to post

However, I don't consider creating a ZIP file as "a little CPU usage".

Sounds like you've never done any database programming. Reading from a database is I/O bound which means the CPU is sitting there twiddling it's silicon thumbs. If you compress the data as it comes in you're just using this wasted CPU idle time.

 

But this point is moot. The PQ generators already ZIP the results so were not putting any additional load anywhere.

 

Yeah well, it doesn't. And no, an API should not be for sending files, it should be an Application Programming Interface. :rolleyes:

I'm the one who should be rolling my eyes at that statement. Why do you have to take things so literal? They called it an API. Doesn't mean it is an API by the strictest sense of the acronym. It's really a protocol as it's used to transfer data over a network.

 

applications are interested in the content of the PQ's output, and not any files.

I don't understand your fixation on "files". A file is just content stored on disk. That's it. Nothing special. The application receiving the content over the network can put it in a file if it wants to or it could process it all in memory either all at once or as the data arrives.

 

Seeing PQs as a file generator is fine if you're actually interested in getting a file. But when you're on the API, you aren't.

I really don't care what the data format the "API" sends. It can be uncompressed text or it can be ZIPed text. I can store it in a file or not. Groundspeak can store it in a file first or it can generate it on the fly. It doesn't matter.

 

Just like in HTTP a web server can send content generated on the fly or it can send files with the same call. Guess which type of content it can send with less overhead on the server? Yup, files on disk.

 

The point that you are either completely missing or completely ignoring is that the file already exists and is already zipped. It make more sense to create another call in the API/Protocol to just send the file as it is instead of doing all this dev work storing the data a second time in the database.

 

Oh please. Yeah, a file system is made for making files accessible, and a database is made for making data available. You're not interested in getting a file, you're interested in getting the data.

No, I'm interested in getting the file. That's why it's the PQ retrieval function and not the get caches function. Eventually when the limits of the get caches function are expanded to what a PQ can do then we can get rid of the PQ retrieval function.

 

Therefore it makes no sense to facilitate a file system and files to transmit the data. None at all. The theoretical performance difference you have your mind set on is negligible and moot, because 1) the backend most likely isn't even able to use your favorite function call, 2) we're not talking about a bulk file download server here, 3) each PQ is usually downloaded exactly once after being generated, and most importantly 4) the application doesn't want a file anyway!

1) Wrong. ASP.NET has a TransmitFile call.

2) Huh? All web servers can bulk transmit files. It's what they're best at and why static web sites are so fast.

3) No argument there. And when the get caches call duplicate the function of a PQ we can do away with the get PQ call.

4) I'm quite sure that GSAK writes it to a file and processes it just like any other GPX file.

 

The API's HTTP frontend doesn't have a performance problem. It only struggles with sending out large PQs because they're freaking ZIP files that it needs to open and read! So yeah, you can solve that by skipping the "open and read" step and just send out the raw ZIP files. Or you can solve that by getting rid of the files completely and just send out the original data (like it does with all other API calls). The second option makes much more sense because we don't want any files to begin with.

I fully agree that the proper way is to make a Get My Finds call that returns data like the Get Caches call. But this is more work than a quick fix to the Get PQ call to just return the file as is.

 

The Get PQ call is more than likely temporary anyways until they can get the performance of the Get Caches call up to what's needed to make a Get My finds feasible.

 

Currently with a 6000 cache per day limit on the Get Caches call, getting my finds would pretty much wipe that out.

Edited by Avernar
Link to post
Sounds like you've never done any database programming. Reading from a database is I/O bound which means the CPU is sitting there twiddling it's silicon thumbs. If you compress the data as it comes in you're just using this wasted CPU idle time.

:laughing:

Sounds like you've never done any database administration. If I/O is what your DB engine is waiting on, then you've got some serious problems at hand. I would much expect that the Groundspeak DB does not have such a problem.

 

But this point is moot. The PQ generators already ZIP the results so were not putting any additional load anywhere.

Means you still didn't get it. If you plan on grabbing the cache data through the API anyway, then the PQ generators don't need to ZIP the data.

 

I'm the one who should be rolling my eyes at that statement. Why do you have to take things so literal? They called it an API. Doesn't mean it is an API by the strictest sense of the acronym. It's really a protocol as it's used to transfer data over a network.

If that's the case, why use an API to begin with? Why not just do everything with FTP? You upload a file containing your request, the server processes that and provides a file containing the results. Or how about SMTP? You send an email with your request and get a reply email back with the results. Or SNMP or NNTP maybe? All those are also just protocols used to send data over a network. :rolleyes:

 

I don't understand your fixation on "files". A file is just content stored on disk. That's it. Nothing special. The application receiving the content over the network can put it in a file if it wants to or it could process it all in memory either all at once or as the data arrives.

Well, except when that content comes in the shape of a ZIP file. Then you need the whole thing first before you can start processing it. You don't need to dump it to disk, you can also keep it in memory, but you still need the complete file first. Kinda silly that ZIP files have their TOC at the end, isn't it? But you know what? That's what makes a file a file!

 

I really don't care what the data format the "API" sends. It can be uncompressed text or it can be ZIPed text. I can store it in a file or not. Groundspeak can store it in a file first or it can generate it on the fly. It doesn't matter.

It does matter in terms of performance. If you have a set of data and want to send it across a network, then you need to format it in some way. Ok. Now if you do that, then dump everything to a ZIP file and send the ZIP file across, just to have the ZIP file unpacked again on the other side to get the data out and the ZIP file discarded, then that's wasted effort. On both sides.

 

Just like in HTTP a web server can send content generated on the fly or it can send files with the same call. Guess which type of content it can send with less overhead on the server? Yup, files on disk.

Which works well if you're actually loading static content. PQs aren't static though, they're dynamically generated whenever they run. So tell me what works better for dynamic content: Generate it on the fly and send it out, or generate it, dump it to a file, then send the file out and then delete the file again?

 

The point that you are either completely missing or completely ignoring is that the file already exists and is already zipped. It make more sense to create another call in the API/Protocol to just send the file as it is instead of doing all this dev work storing the data a second time in the database.

The point that you are completely missing or ignoring is that what I'm talking about is not to create the ZIP file at all to begin with! I don't know how many times I've said that now.

 

No, I'm interested in getting the file. ...

You are? So what do you do with that ZIP file once you downloaded it?

 

1) Wrong. ASP.NET has a TransmitFile call.

2) Huh? All web servers can bulk transmit files. It's what they're best at and why static web sites are so fast.

3) No argument there. And when the get caches call duplicate the function of a PQ we can do away with the get PQ call.

4) I'm quite sure that GSAK writes it to a file and processes it just like any other GPX file.

1) I'm not talking about ASP.NET, I'm talking about the API framework that they're developing in. I don't know what exactly it looks like, but I'd expect that the different methods return raw data and the framework implicitly formats that as JSON or XML depending on what is requested. Circumventing such a mechanism within the framework might not be easily possible. But since neither of us knows, it's all speculation.

2) The point was that the performance benefit of your favorite function is negligible unless you're dealing with a server which hands out lots and lots and lots of static content, and that repeatedly. The API doesn't.

3) Or you hybridize the two. Searches which can quickly be done on the fly work as they do now, and those which can't are processed in the background and their results are made available through the same mechanism at a later point, when they're ready. Which is exactly what I'm suggesting. I don't know if it actually makes sense in their DB setup, but it sure as heck makes more sense than doing the same, plus dumping the results to a ZIP file and then downloading the ZIP file.

4) Of course it does. But it wouldn't have to if what you were downloading wasn't a ZIP file :rolleyes:

 

I fully agree that the proper way is to make a Get My Finds call that returns data like the Get Caches call. But this is more work than a quick fix to the Get PQ call to just return the file as is.

 

The Get PQ call is more than likely temporary anyways until they can get the performance of the Get Caches call up to what's needed to make a Get My finds feasible.

Which again is the whole point of my suggestion. Do away with the "quick fixes" and workarounds and get a proper solution in place. Downloading pre-generated ZIP files through the API is totally absurd. Less absurd than having the backend open the ZIP, extract the contents and send out the contained data, but still absurd.

 

Anyway, I will invoke one of my forum discussions rules at this point: once a quote gets split up into 10 or more pieces, the discussion is over :laughing:

Link to post

Sounds like you've never done any database administration. If I/O is what your DB engine is waiting on, then you've got some serious problems at hand. I would much expect that the Groundspeak DB does not have such a problem.

Sounds like you've never done any database programming. For one time requests you're not going to hit any in memory indexes or cached DB rows. Most PQs are set for exclude my finds and the My Finds PQ will be the opposite. Both of which essentially grab a random set of caches from the database. Therefore it's going to be I/O bound.

 

Means you still didn't get it. If you plan on grabbing the cache data through the API anyway, then the PQ generators don't need to ZIP the data.

I get it but you don't seem to know how data is requested over the internet. You're NOT calling a function!!!

 

You send a bunch of bytes down a TCP/IP connection to request what you want. For Opencaching it's a standard HTTP request, the Geocaching API might be the same or it might be proprietary. Doesn't matter.

 

The response from the server is a bunch of bytes down a TCP/IP connection. For Opencaching it's JSON for most requests and XML for GPX files. The data can be generated on the fly from the database, stored in a memory cache or sent from a pregenerated file.

 

Now here's the part that you don't get. If the data is already zipped then you can just send it as is. The fact that you're using an API doesn't stop you from sending data from a file, zipped or otherwise.

 

The Geocaching API is no more an API than HTTP or FTP is. It's a protocol. Unless Groundspeak provided a code library that you link into your program it is NOT an API!!! "Groundspeak API" just sounds way cooler than "Groundspeak Database Access Protocol".

 

I 100% agree that the PQ generators don't need to ZIP the data. But the so called API would still send a file, only as text/xml.

 

If that's the case, why use an API to begin with? Why not just do everything with FTP? You upload a file containing your request, the server processes that and provides a file containing the results. Or how about SMTP? You send an email with your request and get a reply email back with the results. Or SNMP or NNTP maybe? All those are also just protocols used to send data over a network. :rolleyes:

Because it's not an API. It's a protocol. It's more than likely HTTP anyways. I'll have to hunt down the API document to find out for sure if it's HTTP like the Opencaching protocol.

 

Why use it over FTP? Same reason you use NNTP, SNMP, POP, IMAP, etc over FTP. They've been designed for a specific purpose so are more efficient in requesting the data you want.

 

Well, except when that content comes in the shape of a ZIP file. Then you need the whole thing first before you can start processing it. You don't need to dump it to disk, you can also keep it in memory, but you still need the complete file first.

No you don't. Now I know you've never uncompressed a ZIP file on the fly before. I have. Each file has a local header in front of it.

 

Kinda silly that ZIP files have their TOC at the end, isn't it?

No. It lets you add files to an archive without re-writing the whole thing. While not as important now that disks are freaking huge, back when ZIP was created it was a very good idea.

 

But you know what? That's what makes a file a file!

No. Writing data to a disk makes a file.

 

It does matter in terms of performance. If you have a set of data and want to send it across a network, then you need to format it in some way. Ok. Now if you do that, then dump everything to a ZIP file and send the ZIP file across, just to have the ZIP file unpacked again on the other side to get the data out and the ZIP file discarded, then that's wasted effort. On both sides.

I agree. But you keep thinking in terms of a brand new implementation. And when the API is complete then you'd be 100% correct.

 

But it's not. Right now PQs are still necessary. There's already infrastructure in place to generate and ZIP them. Having the API send them as is instead of unzipping them is a good middle step.

 

You can't just develop a perfect system and expect everyone to switch to it instantaneously. The IT world doesn't work that way. During the transition things are not going to be done in the most optimal way.

 

Which works well if you're actually loading static content. PQs aren't static though, they're dynamically generated whenever they run. So tell me what works better for dynamic content: Generate it on the fly and send it out, or generate it, dump it to a file, then send the file out and then delete the file again?

You've got a oddball definition of dynamic content. Using that definition all files would be considered dynamic. :rolleyes:

 

PQs are static. They're generated once when they run and can be downloaded more than once up to 7 days later. During that time the data doesn't change. That's static content no and's, if's or but's.

 

For dynamic content, yes, generating it on the fly and sending it right away is the best way to do it. That's what the Get Caches function does!

 

PQs are not generated on the fly. The PQ previews are however. Each time you preview a PQ you get data that was generated on the fly.

 

The point that you are completely missing or ignoring is that what I'm talking about is not to create the ZIP file at all to begin with! I don't know how many times I've said that now.

I am not missing your point: Don't ZIP it, don't store it on disk. Got it. Please stop saying that I don't get it.

 

Now can you get my point: This is what the Get Caches API call does. But until the distance and results/day are majorly increased it can't be used to get all of your finds.

 

You are? So what do you do with that ZIP file once you downloaded it?

GSAK would unzip it and process it like any other downloaded zipped PQ. That's what it does with the unzipped GPX files it grabs with the API too.

 

What I would like to see is GSAK grab my finds through the API and dump them directly into it's SQLite database. Unfortunately the API doesn't allow that because of distance and caches/day limits. I can't even replace my regular PQs with the API because of the 50km distance limit, never mind the My Finds.

 

If Groundspeak could remove those limits in a month then we wouldn't need to download PQs. But I predict it's going to take them a while to scale up their infrastructure before that happens.

Link to post

1) I'm not talking about ASP.NET, I'm talking about the API framework that they're developing in. I don't know what exactly it looks like, but I'd expect that the different methods return raw data and the framework implicitly formats that as JSON or XML depending on what is requested. Circumventing such a mechanism within the framework might not be easily possible. But since neither of us knows, it's all speculation.

2) The point was that the performance benefit of your favorite function is negligible unless you're dealing with a server which hands out lots and lots and lots of static content, and that repeatedly. The API doesn't.

3) Or you hybridize the two. Searches which can quickly be done on the fly work as they do now, and those which can't are processed in the background and their results are made available through the same mechanism at a later point, when they're ready. Which is exactly what I'm suggesting. I don't know if it actually makes sense in their DB setup, but it sure as heck makes more sense than doing the same, plus dumping the results to a ZIP file and then downloading the ZIP file.

4) Of course it does. But it wouldn't have to if what you were downloading wasn't a ZIP file :rolleyes:

1) So what you're saying is your guess is better than mine even through we know that ASP.NET is the current system they use. Nice.

2) I thought you said in #1 that you don't know how the API works. Now you're stating as a fact that the performance between TransmitFile and ReadFile/WriteSocket is negligible in the API. Groundspeak has already told us that the unzipping in memory is the bottleneck.

3) And during the 6-12 months it takes them to develop this hybrid system (which sounds like a neat idea) I'd still like to download the My Finds PQ.

4) Wrong again. PQs aren't ZIP files now and GSAK still writes them to a file before processing. Why? Because it was the easiest and fastest way to do it. GSAK already has code to process GPX files on disk. He reused existing code. That's why GSAK 8 was released a little while ago instead of half a year for now.

 

But even if he spent the time to get it to dump GPX files directly into the database, he could still do it if they were ZIPed. As I said above, ZIP files can be uncompressed while streamed.

 

Which again is the whole point of my suggestion. Do away with the "quick fixes" and workarounds and get a proper solution in place. Downloading pre-generated ZIP files through the API is totally absurd. Less absurd than having the backend open the ZIP, extract the contents and send out the contained data, but still absurd.

So you'd rather wait months or years for the "proper solution" instead of using a "quick fix" in the meantime. I rather have the quick fix so I can use it now.

 

Anyway, I will invoke one of my forum discussions rules at this point: once a quote gets split up into 10 or more pieces, the discussion is over :laughing:

That works perfectly for me, the forum limit is 10! Nice to know I have an easy way to win an argument. :anibad:

Link to post

From the Forum Guidelines:

 

Private discussions: Sometimes, a discussion thread strays off into a friendly dialogue or a heated debate among a very small number of users. For these exchanges, we ask that you please use the Private Message feature that is provided through the Groundspeak forums, or the Geocaching.com e-mail system. Public forum posts should be reserved for matters of interest to the general geocaching community.

 

I think we've reached the point in the quote dissection festival where this whistle needs to be blown. Thank you.

Link to post

Hi, I have the iphone4 and run the geocaching app and have done so for some time now. The issue I have is that I run the "my finds" pocket query from the website and it generates ok, but it does not appear in the app under pocket queries..!! I get an email containing a .zip file which I cannot seem to transfer in to the app.... :-(

How do I get "my finds" pq in to the app??? Any ideas, many thanks

Link to post

Hi, I have the iphone4 and run the geocaching app and have done so for some time now. The issue I have is that I run the "my finds" pocket query from the website and it generates ok, but it does not appear in the app under pocket queries..!! I get an email containing a .zip file which I cannot seem to transfer in to the app.... :-(

How do I get "my finds" pq in to the app??? Any ideas, many thanks

 

Why would you want it in your phone/app? The APP already knows all your finds when it communicates with the geocaching site. I'm confused what you need it for.... (pardon).

 

The only reason my husband and I run the 'My Finds' PQ is to update our stats on the geocaching website (profile).

Link to post

Hi Lieblweb, I would like "my finds" listed in the app so I can see all my finds and the app does not show your total number of caches found. Anyone else can help please? Thanks

 

Ok, sounds legit....

 

Maybe this should be something you submit as a request for future upgrades - for the APP to list total cache count? Or maybe that request already exists?

 

But for now....just open up your web browser (on your phone) and login to your geocaching profile.

Edited by Lieblweb
Link to post

Hi Lieblweb, I would like "my finds" listed in the app so I can see all my finds and the app does not show your total number of caches found. Anyone else can help please? Thanks

 

Ok, sounds legit....

 

Maybe this should be something you submit as a request for future upgrades - for the APP to list total cache count? Or maybe that request already exists?

 

But for now....just open up your web browser (on your phone) and login to your geocaching profile.

Indeed, just open browser is what I am currently doing.....bit of a pain when the app used to recognise the "my finds" pocket query :-( which the website automatically has in place and now does not link to the app anymore...I was hoping there was still a way of getting it working ;-)

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...