Jump to content

Pre-generated Gpx Files


xafwodahs

Recommended Posts

I periodically take vacations to other states for the sole purpose of geocaching (actually siteseeing, but the geocaches provide the specific destinations...)

 

Anyway, every trip is the same: it's a big state, and I don't have any set itenerary, so I don't know where I'll be going. So I want my GPSr to have all the caches for the entire area (or at least as many will fix into my GPSr).

 

I end up setting up multiple pocket queries that partially overlap and together cover a wider area. If a pocket query returns 500 caches, I shrink the max distance for the query, re-adjust the other queries so there are no coverage gaps, and run all the queries again. Sometimes I alter the parameters of the query to try to reduce the number of caches. I repeat this process until I'm satisfied with the results. I then have a PERL script which I've written to combine all the GPX files, remove the duplicates (due to the overlapping), and generate a final GPX file. From there I can use other tools to filter out some caches until I have <1000, which is how many my GPSr can hold.

 

Running and rerunning the pocket queries is tedious and can take days. It occurred to me on this last trip that having pre-generated GPX files available for download from geocaching.com might be an answer.

 

(Note: I haven't yet tried to guestimate any numbers on this to see if it's actually practical; I'm hoping the site maintainers will comment on this...)

 

It could work this way:

 

1. Divide cacheable areas into simple grids

-------------------------------------------------

Define a simple grid structure. For example, each grid block could be 30 minutes latitude by 30 minutes longitude.

 

2. geocaching.com runs automatic queries

------------------------------------------------

Periodically (weekly, maybe), the geocaching.com query machine runs queries for each grid block and stores the resultant GPX file.

 

3. User downloads GPX files

--------------------------------

Thru some browser interface, the user can specify the bounds of the area they're interested in, and get a list of these pre-generated GPX files, which the user can then download.

 

4. User uses PC software to do the rest

---------------------------------------------

The user can then use PC software to combine the results, experiment with filtering out different types of caches to reduce the cache count, etc.

 

The ultimate goal here is to get all the caches in an area of interest onto the user's PC. At this point, the user can try different filters (e.g. terrain difficulty of 3 or less), etc to get the final list of caches they want to take on their trip.

 

Comments are welcome.

Link to comment

Instead of coming up with different PQs to cover the area you want to visit without overlaps, and in essence doing a lot of your filtering with the web site, why not just download several large PQs and completely filter them at your computer with GSAK?

 

For instance, set up 5 PQs that each cover the largest area possible, that each covers only one year from 2001 to 2005. Then you can get 2,500 caches to load into GSAK.

 

After that, you can filter them in lots and lots of ways, including filtering them along a route.

 

That's easier than downloading blocks of pre-running queries to cover the world (or the country) where 99% of them probably won't get downloaded anyway.

Link to comment

And what happens when one of those queries, say the one for 2004, comes back with 500 caches. That most likely means that it maxed out at 500, but there are more caches from that year.

 

If you want them all, then you have to split that query for 2004 into 2 queries, maybe 1 for the first half of 2004 and the other for the second half.

 

Whether you define your queries based on time or location is irrelevant - you still might get 500 caches in a result, which means you're missing some, and you'll have to reduce the scope of those queries until the results are <500 caches.

 

Some would argue that since you're going to be filtering out some caches anyway, who cares about those missing caches? But maxing out at 500 caches doesn't give the user the choice of *which* caches to filter out.

Link to comment

I started running a few numbers for storage size.

 

I wrote a quick script to calculate the average size of each <wpt> </wpt> element in a gpx file.

 

After running it on a couple gpx files (both with 500 caches), I show an average of about 5300 bytes per wpt.

 

The main geocaching page currently reports 212101 caches.

 

212101 * 5300 bytes = ~1.05 GB

 

That doesn't seem like it would be a problem.

 

The bigger problem seems like it would be defining a grid structure that would be a good balance between number of gpx files (and hence number of queries) and the average/max number of caches per gpx file.

 

However, if this were implemented, I don't see a reason why any particular gpx file would have to be limited to 500 caches...

Link to comment

I would think the bigger problem would be limitng the umber of pre-generated files that any one member downloaded. I believe one of the reasons there are waypoint limits and PQ limits is to prevent people from getting the entire list of caches data.

 

Another issue would be the grid system. If you just devided up the whole area into equal grids, some PQs will have a lot of caches and others will have like 4.

 

However, I am keen on the idea of some pre-generated queries. Perhaps with a little number crunching (ok a lot!) and some smoke and mirrors, TPTB could come up with some "most requested" areas or something like that. It might take some of the strain off the PQ server. But there is no telling if it would be worth it.

Link to comment

One other point I forgot to make:

 

Some would argue that all these queries would overload an already overloaded query machine.

 

However, I would argue that having these pre-generated gpx files would reduce the use of custom queries and that the total number of queries may go down. I don't know how many queries are currently done, so I can't say for sure.

 

Also, these queries could be run at any time of day - whenever the query machine is least loaded - and the user wouldn't be waiting for the results.

 

The ultimate effect would be to transfer some of the custom query processing (which can require a lot of computing power) over to file downloading (which requires relatively little computing power).

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...