Jump to content

feature request: New PQ type


Arrow42

Recommended Posts

Just like the topic description sais "Caches changed since my last PQ" would be nice. Select a PQ to check against and it returns any caches that have been updated in any way, including disabled, archived, and anything else that might be of interest to someone who is going into the field.

 

This has the advantage of updating my private store of the thousand nearest caches but also doesn't take up nearly as much resources on the server as pulling the 1k caches every week.

Link to comment

Seems to me that would take more resources. You would first have to run the last PQ, then you would have to run todays PQ, then compare the two and send the ones that have updated.

 

The next time you would have to know which caches were not updated in the second PQ and also run a PQ for the third PQ. You would be checking against the caches not updated in the first PQ and the caches updated in the second PQ. Eventually you would be checking against an huge number of past PQ's

Link to comment

Just like the topic description sais "Caches changed since my last PQ" would be nice. Select a PQ to check against and it returns any caches that have been updated in any way, including disabled, archived, and anything else that might be of interest to someone who is going into the field.

 

This has the advantage of updating my private store of the thousand nearest caches but also doesn't take up nearly as much resources on the server as pulling the 1k caches every week.

 

How do you figure it doesn't load the servers? Some how it has to go check each and every cache to see if there was a status change in addition to generating the list of new ones, then store this list somewhere to reference it the next time you want to go bang on the servers. Sounds like a disguised feature request to get archived caches.

 

Jim

Link to comment
Seems to me that would take more resources. You would first have to run the last PQ, then you would have to run todays PQ, then compare the two and send the ones that have updated.

 

Naw, you would just need to store the previous PQs for a predetermined time period.

 

How do you figure it doesn't load the servers? Some how it has to go check each and every cache to see if there was a status change

 

Depends on how the database stores the information. There is a huge difference in load between 2 queries to check a status and a dozen queries to grab every data field, 5 comments, way points, etc etc etc. It's a lot of savings when extrapolated to thousands of PQs per day and tens of thousands of caches that wouldn't need to have the full amount of data pulled each time.

 

Data backup systems use almost exactly the same kind of scheme to facilitate incremental backups. Proxy servers do something similar to increase website loading speed, and do it quite successfully.

 

Sounds like a disguised feature request to get archived caches.

 

Jim

 

Whatever you say chief. Ad hominem, etc.

 

So the question becomes - what is the more expensive commodity for groundpseak... clock-cycles or bits on the hard drive.

Link to comment
Sounds like a disguised feature request to get archived caches.

 

Jim

 

Whatever you say chief. Ad hominem, etc.

 

Not really, given the number of threads so far this year arguing for archive data in a PQ.

 

But expanding on your "Caches changed since my last PQ", many, most?, will have had logs added. So your going to be pulling those also? Probably not a large number, but caches with changed co-ordinates, new hints, changed descriptions also? The cycle savings is starting to dwindle.

 

Jim

Link to comment

How do you figure it doesn't load the servers? Some how it has to go check each and every cache to see if there was a status change in addition to generating the list of new ones, then store this list somewhere to reference it the next time you want to go bang on the servers. Sounds like a disguised feature request to get archived caches.

 

Record modification times are natively handled by real databases and are extremely fast to check. A PQ already knows the last time it was run (you know it knows because it shows you on the /pocket page...) so it would be a mere matter of programming to exclude caches that had not been modified since that time and that had no logs since that time. You'd have to handle a few funky cases like what if the PQ itslelf got modified (hint: you have to return all selected records in the next run) and decide what to do about caches that had been archived (which has a straight-forward answer, but the house is firmly against acknowledging the existence of archived caches in a programmatic way) but the actual request is pretty straight-forward to code up.

 

It's not quite trivial, but it'd hardly be herculean to code, either.

 

You have then, however, put the merge responsibility on the receiver of the PQs. The simple act of copying a GPX to your receiver no longer works; you have to track ordering and essentially implement transaction replay. By breaking the atomicity of the PQ's ("this is the group of caches that matched your criteria at time T") and introducing a set of rolling deltas, you've introduced a knob that people will hurt themselves with.

 

I'm seeing a lot of edge cases that would be introduced and for comparatively little known value. Even those of us that are programmers and can speculate how the DBs involved work and the GPX serialization and delivery processes work don't *know* where the bottlenecks are, so guessing what makes life measurably easy for them is exactly that - guessing.

Link to comment

Not really, given the number of threads so far this year arguing for archive data in a PQ.

 

I didn't ask for any information other then the fact the item was archived and only then to facilitate removing it from the user's archive.

 

But expanding on your "Caches changed since my last PQ", many, most?, will have had logs added. So your going to be pulling those also? Probably not a large number, but caches with changed co-ordinates, new hints, changed descriptions also? The cycle savings is starting to dwindle.

 

Jim

 

Major status changes not finds/dnfs/notes.

Link to comment

Record modification times are natively handled by real databases and are extremely fast to check. A PQ already knows the last time it was run (you know it knows because it shows you on the /pocket page...) so it would be a mere matter of programming to exclude caches that had not been modified since that time and that had no logs since that time. You'd have to handle a few funky cases like what if the PQ itslelf got modified (hint: you have to return all selected records in the next run) and decide what to do about caches that had been archived (which has a straight-forward answer, but the house is firmly against acknowledging the existence of archived caches in a programmatic way) but the actual request is pretty straight-forward to code up.

 

It's not quite trivial, but it'd hardly be herculean to code, either.

 

Thank you for relating your experience in this field. As far as archived caches... well, the user already has the full data on the last cache from the previous PQ.

 

You have then, however, put the merge responsibility on the receiver of the PQs.

That is a valid point. I know how I would personally plan on using them, but there isn't anything available in the third-party software to handle it. Yet. "If you build it, they will come" perhaps.

 

...don't *know* where the bottlenecks are, so guessing what makes life measurably easy for them is exactly that - guessing.

Yeah, guessing. For all I know they are already using a caching server on the back-end. Maybe a back-end caching server would be a better solution? Just tossing an idea out there to see if it sticks.

Link to comment

At one time the stated bottleneck was the gathering of the last 5 logs. While it is most likely coincidence I did suggest the grab the last 5 logs and stuff them into a custom field so that particular query wouldn't have to be fun again. Immediately after the suggestion the PQs started running smoothly again. Coincidence? Don't know, don't care.

 

The point is if such an optimization did occur, the suggestion provided by the OP would toss that out the window as every cache in every PQ would have a different number of logs. That would slow the PQs to a crawl.

 

I've made suggestions in the past that would reduce bandwidth, server load, and provide the product the consumers are wanting.

 

Plus, considering the major of traffic is cyclical on a weekly period I think the "last 7 days" function is fine.

 

If PQs that have the "last 7 days" options set send only and all data that was changed in the last 7 days then the size of the PQs would drop dramatically. As I've mentioned time and again, based on my observations, only about a quarter or a third of all caches change in only one week. Then the vast majority of those changes are one or two logs. Plus, few caches change their descriptions, ratings, etc. No need to send those again in a differential PQ. Add a way to mark archived caches as such, and you'll have a fantastic system that keeps folks up to date in an automated fashion with a lot less load on the gear.

Link to comment

The OP might want to do a forum search on archived caches and see what Groundspeak's attitude has been in the past. Groundspeak provides PQs to allow geocachers a way to get a generous but limited number of geocaches that can then be loaded directly into a a GPS or into a 3rd party tool for further filtering or manipulation for planing their geocaching activities. Some of these 3rd party tools allow you to maintain an offline database of caches. This offline database will never be as accurate as getting the latest caches from Geocaching.com. New caches are published all the time. Old caches are temporarily disabled and re-enabled, and caches are archived. The enhancement that the OP has asked for may allow him to track changes to the Geocaching.com database since the last time he ran his PQ and make it easier to maintain his offline database. It will still be only as up to date as the last time he ran his query. Would the hoped for reduced sever load mean that he could run this query more often? Or would the ability to remember what changed since the last run mean he runs the PQ less often and therefore has more of a chance to be geocaching with stale data?

 

It may be that this could be programmed in such a way as to reduce the resource requirements for running a PQ. Let's assume that we know when the PQ was last run and that the PQ was not modified since it was last run. The new query feature would run the same query with an additional WHERE clause ( AND cache was updated since last run). We already have an option (AND cache was updated in the last 7 days) so assuming that you were to run the PQ every seven days the new request doesn't buy you anything - other than the OPs request that it include archived caches. If users wanted to reduce the load on the servers they could use the existing capability.

 

My guess, despite what Coyote Red says, is that people use the changed in the last seven days to try and increase the number of caches they keep in their offline database. If they know that only about 10% of the caches will change in a a seven day period, they can use one PQ to maintain a database of close to 5000 caches (that would take 10 PQ otherwise). Since Groundspeak doesn't include archived caches, geocachers have to find other ways to remove these from their offline database, so many forgo the "savings" and get all the caches each time. While keeping offline database for personal use is not against the TOU despite what some people may say, Groundspeak has indicated its preference that people refresh their data using new PQs rather than relying on on unreliable methods to to keep their offline data up to date.

 

I used to wonder why people feel the need to reduce the load on the PQ sever. Computers don't really care if the are asked to do something in a way that some people find inefficient. The only reason for wanting to improve efficiency is if there is more work that the computer can keep up with. We have seen from time to time, the PQ server experience loads where it was unable to process all the PQ scheduled for a certain day or where there where there are long delays in waiting for PQ to run. So it is understandable that users are looking for ways to help Grounspeak manage the load on the PQ server. Currently the PQ servers are not having these problems. So long as Groundspeak is making sure that the PQ servers have the capacity to handle the current PQs it seems silly for them to implement a change just to reduce the load on the servers. Should they be looking for ways to reduce the load to accommodate future growth then perhaps this suggestion could be implemented. However, given that many people are not maintaining offline databases but instead just run PQs on an as needed basis, and many of the those that do keep an offline database are not likely to take advantage of this suggestion, it may not really help much. Groundspeak seems to feel that encouraging this use of ad hoc PQs over offline database would have much more of an impact on reducing server loads. The last thing they are likely going to try is any change that would encourage more use of offline databases.

Link to comment
This offline database will never be as accurate as getting the latest caches from Geocaching.com.
Wrong. When I update my database with caches that haven't changed, the caches that haven't changed are as fresh as the last time they did change. The data could be months old and still be fresh. All data that Groundspeak sends is only as fresh as the last change in their database. A differential PQ would provide just as fresh data as a complete set. Why? Because we already have the freshest data on the unchanged caches.

 

My guess, despite what Coyote Red says, is that people use the changed in the last seven days to try and increase the number of caches they keep in their offline database. If they know that only about 10% of the caches will change in a a seven day period, they can use one PQ to maintain a database of close to 5000 caches (that would take 10 PQ otherwise).
No doubt. But that's a very simplistic view.

 

While in theory you can maintain a larger database than the maximum allowed complete set, it would be very difficult to stay on top of. When any one PQ hits it's limit you won't know which caches were not included. You'd get orphaned listings until the next run--that equals stale data. I know I don't keep an efficiently running data set going only have stale data. That runs counter to the reason I maintain the offline database to begin with.

 

I'd run differential PQ to reduce work load for both me and Groundspeak. If I'm loading only a quarter to a third of the caches, my own updates to my private database would be reduced by the same amount. Provide the other reductions mentioned and my end would benefit even more.

 

No, differential PQs aren't so people can cheat on their allotment.

 

Currently the PQ servers are not having these problems.
Yes, "currently." That's the key. The PQ servers have problems all the time. It goes in cycles. The PQs run fine for a while and as more folks come on, it gets slower. Then they fix something and it's fine for a while. You've been around long enough to have gone through several of these cycles.

 

Also, that Groundspeak has continued to not implement any fundamental change in the PQ system simply makes me shrug and get the data the best way I can. That's being grossly inefficient and pulling full data sets. Hopefully, the new PQ system will allow such things. Here's me with my fingers crossed.

Link to comment

How do you figure it doesn't load the servers? Some how it has to go check each and every cache to see if there was a status change in addition to generating the list of new ones, then store this list somewhere to reference it the next time you want to go bang on the servers. Sounds like a disguised feature request to get archived caches.

 

Record modification times are natively handled by real databases and are extremely fast to check. A PQ already knows the last time it was run (you know it knows because it shows you on the /pocket page...) so it would be a mere matter of programming to exclude caches that had not been modified since that time and that had no logs since that time. You'd have to handle a few funky cases like what if the PQ itslelf got modified (hint: you have to return all selected records in the next run) and decide what to do about caches that had been archived (which has a straight-forward answer, but the house is firmly against acknowledging the existence of archived caches in a programmatic way) but the actual request is pretty straight-forward to code up.

 

It's not quite trivial, but it'd hardly be herculean to code, either.

 

You have then, however, put the merge responsibility on the receiver of the PQs. The simple act of copying a GPX to your receiver no longer works; you have to track ordering and essentially implement transaction replay. By breaking the atomicity of the PQ's ("this is the group of caches that matched your criteria at time T") and introducing a set of rolling deltas, you've introduced a knob that people will hurt themselves with.

 

I'm seeing a lot of edge cases that would be introduced and for comparatively little known value. Even those of us that are programmers and can speculate how the DBs involved work and the GPX serialization and delivery processes work don't *know* where the bottlenecks are, so guessing what makes life measurably easy for them is exactly that - guessing.

 

The problem is that there is no list of what caches were on a PQ at the last time it was run (except for the owner's copy). Since a PQ has a size limit, every time a new cache is published inside the PQ's accepted radius, a cache on the periphery gets pushed out. If that cache was changed, that fact would never get picked up.

Link to comment

The problem is that there is no list of what caches were on a PQ at the last time it was run (except for the owner's copy). Since a PQ has a size limit, every time a new cache is published inside the PQ's accepted radius, a cache on the periphery gets pushed out. If that cache was changed, that fact would never get picked up.

 

I've heard talk that they will be changing how PQs are sent out. It might just be wishful thinking - I have no idea. The concept I've come up with would require that PQs are stored on the server to be compared against. That's where the disk space vs clock cycles tradeoff comes in. In my experience the trade-off is worth it, but without being inside of Groundspeak's IT department it's hard to tell for sure.

Link to comment
Also, that Groundspeak has continued to not implement any fundamental change in the PQ system simply makes me shrug and get the data the best way I can. That's being grossly inefficient and pulling full data sets. Hopefully, the new PQ

system will allow such things. Here's me with my fingers crossed.

 

I have mine crossed with you. I agree 100% with your post.

Link to comment
I've heard talk that they will be changing how PQs are sent out. It might just be wishful thinking - I have no idea. The concept I've come up with would require that PQs are stored on the server to be compared against. That's where the disk space vs clock cycles tradeoff comes in. In my experience the trade-off is worth it, but without being inside of Groundspeak's IT department it's hard to tell for sure.

 

There is no trade off. Disk space is dirt cheap these days. Clocks tics are "expensive." Simple answer.

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...