Jump to content

Feature request: Cache search results by cache density in a zip code


MikeStammer

Recommended Posts

I hope the title makes sense. What I mean by this is that it would be very nice to be able to search and have the results shown in configurable groups that indicate cache density in a zip code. For instance say I choose a cache grouping of 5 miles and a minimum cluster size of 10. When i search a zip code, it would only show me areas that have at least 10 caches within 5 miles of each other. The search results could be displayed as lat/lon coordinates of the center of the cluster, colored circles drawn on a map, or some other representation (intersection of streets or something). Does this make sense? Anyone else see value in this?

 

Mike Stammer

Link to comment

Actually, it's not that hard to implement on the client side. Take something like GPX2HTML that already knows how to compute "nearby caches" and it's pretty trivial becuase it knows where all the caches are. From there, it's a simple matter of building a web between them and counting the nodes. When I added this to my software (no, you can't have it) it was something like a dozen lines of code.

 

Not everything has to be done on the web site...The majority of the feature requests I see really should be done on the client.

Link to comment

If everyone has to write the code to do it themselves (I have no problem doing this, but...) then you will end up with 500 different utilities. It should either be incorporated on the web or a project should be started to have a set of client tools thats sanctioned by geocaching.com.

 

Mike Stammer

Link to comment

There are projects to create client tools. Not sanctioned by geocaching.com, but I think that's a mistake anyway. One of them is actually spearheaded by the guy you're replying to. I suspect that Robert has his reasons for not sharing his code, and I further suspect that it's not just because he wants to be mean.

 

pirate.cgi.gif

Link to comment

Uh. So lets say G is the "Grouping" and S is the cluster size. Using the G=5, S=10 example from above, wouldn't you have to search every point within a radius (or diameter) S of every cache? Eliminate or merge overlaps and then present the data...

 

Unless you limited it to some arbitrary point based on zip code. I can see it isn't difficult, just data intensive.

Link to comment

quote:
Originally posted by Richard-n-Dawn:

Uh. So lets say G is the "Grouping" and S is the cluster size. Using the G=5, S=10 example from above, wouldn't you have to search every point within a radius (or diameter) S of every cache? Eliminate or merge overlaps and then present the data...


I think you meant radius G, or something, but anyway....

 

The problem as written isn't really well-defined. For a cluster of caches to be "all within 5 miles of one another," is it necessary for every cache in the set to be within 5 miles of every other cache in the set, or is it sufficient for every cache in the set to have some other cache, not itself, that it is less than 5 miles from and for it to be impossible to find two subsets of the set for which every cache in the first subset is more than 5 miles from every cache in the second subset?

 

Put simply, does a 20-mile-long string of caches spaced out one every mile count as a cluster?

 

Anyway, assuming that the 20-mile-long string counts, here's what you do:

 

First, for every pair of caches, compute the distance between the two caches and store that as the length of the edge of a graph that connects those two caches. Throw it away if it's more than 5.

 

Now do this until there are no more caches in the list:

Pick a cache.Find all caches that are connected to that cache.If the set of caches has more than 10 members, output the entire set.Remove every member of the set from the list of caches.

Now you're done. You have a list of every cluster of 10 or more caches within 5 miles of each other.

 

If you want the tighter kind of clusters, you have to look for a different kind of connectedness. But I don't think you really want that kind of clusters.

 

Unfortunately, unless you have some pretty good support libraries, this kind of graph manipulation isn't just a dozen lines of code. I'm going to assume Robert had the support libraries. icon_smile.gif

 

Also unfortunately, this isn't a "nice" algorithm. Its runtime and storage requirements are proportional to the square of the number of caches. Obviously, there's a way of restating this that doesn't involve graph theory; I just stated it that way because Robert stated it that way.

 

pirate.cgi.gif

Link to comment

If you want the looser kind of clusters, here's another solution, requiring lots less storage (proportional to n, rather than n² ) but still requiring time proportional to n²:

 

Pick a cache. Throw it into a list of caches in the current cluster.

 

Go over the entire list of caches. For every candidate cache in the list, compute its distance from each cache in the current cluster until you either find one that's less than 5 or you run out of caches in the current cluster. If you found one less than 5, put the candidate cache in the current cluster and remove it from the list of caches. Otherwise, leave it in the list of caches.

 

Now you have a cluster of caches. Output it if it's more than 10 caches. Throw away all of the caches in the cluster, as they can't be in any other cluster.

 

Repeat until you run out of caches.

 

One other advantage to this algorithm over the previous one is that you can skip some of the distance calculations in some cases. Unfortunately, it's still n² in the worst-case scenario.

 

pirate.cgi.gif

 

[This message was edited by Warm Fuzzies - Fuzzy on July 15, 2003 at 12:31 PM.]

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...