Jump to content

How Do You Determine Centroid Cache?


MikeofKorea

Recommended Posts

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

Link to comment

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

 

Part of my doctoral dissertation in statistics dealt with methods of computing a sample mean on a manifold (ie on a surface such as a sphere or a higher dimensional analog) and this is one of the methods. We'd call it an extrinsic mean since you leave the surface and then project back to it. An "intrisic" method of finding the mean would be to find the point y which minimizes sum(d(x_i,y)^2) over points x_i where you've found caches, and where d(x_i,y) is the geodesic distance between x_i and y, ie the great circle distance between x_i and y. If a person has found two caches, at points on the opposite side of the earth from each other, both methods fail (assuming the earth is a sphere which it is not).

Link to comment

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

 

Part of my doctoral dissertation in statistics dealt with methods of computing a sample mean on a manifold (ie on a surface such as a sphere or a higher dimensional analog) and this is one of the methods. We'd call it an extrinsic mean since you leave the surface and then project back to it. An "intrinsic" method of finding the mean would be to find the point y which minimizes sum(d(x_i,y)^2) over points x_i where you've found caches, and where d(x_i,y) is the geodesic distance between x_i and y, ie the great circle distance between x_i and y. If a person has found two caches, at points on the opposite side of the earth from each other, both methods fail (assuming the earth is a sphere which it is not).

 

Yes, actually I have played around with this intrinsic mean as well. Its main disadvantage is that it is "hard" to calculate (it requires a calculation of N distances per iteration, where N is the number of caches). But it has the advantage of always giving an answer on the surface of the Earth.

 

One comment on your proposed mean: since distance on the surface is a metric, I am not convinced that minimizing the distance squared is correct. That gives you more of an RMS average than a mean, IMO. Just minimizing the sum of the distances is probably better.

Link to comment

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

 

Part of my doctoral dissertation in statistics dealt with methods of computing a sample mean on a manifold (ie on a surface such as a sphere or a higher dimensional analog) and this is one of the methods. We'd call it an extrinsic mean since you leave the surface and then project back to it. An "intrinsic" method of finding the mean would be to find the point y which minimizes sum(d(x_i,y)^2) over points x_i where you've found caches, and where d(x_i,y) is the geodesic distance between x_i and y, ie the great circle distance between x_i and y. If a person has found two caches, at points on the opposite side of the earth from each other, both methods fail (assuming the earth is a sphere which it is not).

 

Yes, actually I have played around with this intrinsic mean as well. Its main disadvantage is that it is "hard" to calculate (it requires a calculation of N distances per iteration, where N is the number of caches). But it has the advantage of always giving an answer on the surface of the Earth.

 

One comment on your proposed mean: since distance on the surface is a metric, I am not convinced that minimizing the distance squared is correct. That gives you more of an RMS average than a mean, IMO. Just minimizing the sum of the distances is probably better.

 

There is a reason for using the squared distance. For s standard pdf function f(x) the point a which minimizes integral[(a-x)^2 f(x)] dx is the mean mu that you get from integral(x f(x)) dx. All integrals here are -inf to +inf. The discrete anolog for a sample mean is the summation, and the (a-x)^2 discretizes to the squared distances.

 

We had a fast iterative method to give a computational approximation to this mean and the method worked for the infinite dimensional manifolds we dealt with.

Link to comment

I've read and seen a lot of folks talking about centroid geocaches, but I can't find anything sensible on how to determine your centroid geocache. Is there a program or an equation with your farthest East West North South caches, or what?

 

That info won't do it since the centroid is weighted by each cache.

 

As said before. Get GSAK. Install Centroid Macro. Run against current database and presto all is well.

 

Mine is here

 

Centroid of Database: Found

 

N 38° 31.883 W 116° 33.216

Edited by Walts Hunting
Link to comment

I've read and seen a lot of folks talking about centroid geocaches, but I can't find anything sensible on how to determine your centroid geocache. Is there a program or an equation with your farthest East West North South caches, or what?

 

That info won't do it since the centroid is weighted by each cache.

 

As said before. Get GSAK. Install Centroid Macro. Run against current database and presto all is well.

 

Mine is here

 

Centroid of Database: Found

 

N 38° 31.883 W 116° 33.216

Or run the FindStatGen macro and under "Some Numbers" section is the centroid with a link to a map showing it.

Link to comment

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

 

Part of my doctoral dissertation in statistics dealt with methods of computing a sample mean on a manifold (ie on a surface such as a sphere or a higher dimensional analog) and this is one of the methods. We'd call it an extrinsic mean since you leave the surface and then project back to it. An "intrinsic" method of finding the mean would be to find the point y which minimizes sum(d(x_i,y)^2) over points x_i where you've found caches, and where d(x_i,y) is the geodesic distance between x_i and y, ie the great circle distance between x_i and y. If a person has found two caches, at points on the opposite side of the earth from each other, both methods fail (assuming the earth is a sphere which it is not).

 

Yes, actually I have played around with this intrinsic mean as well. Its main disadvantage is that it is "hard" to calculate (it requires a calculation of N distances per iteration, where N is the number of caches). But it has the advantage of always giving an answer on the surface of the Earth.

 

One comment on your proposed mean: since distance on the surface is a metric, I am not convinced that minimizing the distance squared is correct. That gives you more of an RMS average than a mean, IMO. Just minimizing the sum of the distances is probably better.

 

There is a reason for using the squared distance. For s standard pdf function f(x) the point a which minimizes integral[(a-x)^2 f(x)] dx is the mean mu that you get from integral(x f(x)) dx. All integrals here are -inf to +inf. The discrete analog for a sample mean is the summation, and the (a-x)^2 discretizes to the squared distances.

 

We had a fast iterative method to give a computational approximation to this mean and the method worked for the infinite dimensional manifolds we dealt with.

 

Yeah, you are right. I am naturally skeptical of squared summations because they frequently involve an implicit assumption that f(x) is normal. I think it does here, also, but I can't prove it. Nonetheless, it is certainly better than minimizing the sum of the distances.

 

I typically use a more-or-less brute-force method to minimize functions using a simplex algorithm. It is stable and works for a variety of geometric problems, though it is not as fast as other solutions tuned to their problem space.

 

If I get time I will compare the centroids obtained in 3-space vs. those from the ellipsoid.

Link to comment

As I stated in that thread, I think the best definition of your caching centroid is the point on the surface of the Earth corresponding to the 3-dimensional average of all your cache positions.

 

That is, calculate each cache position as X, Y, and Z in 3 dimensions. Average them all, and that point will be somewhere inside the Earth. Find the point on the surface directly above it, and that is your centroid.

 

Part of my doctoral dissertation in statistics dealt with methods of computing a sample mean on a manifold (ie on a surface such as a sphere or a higher dimensional analog) and this is one of the methods. We'd call it an extrinsic mean since you leave the surface and then project back to it. An "intrinsic" method of finding the mean would be to find the point y which minimizes sum(d(x_i,y)^2) over points x_i where you've found caches, and where d(x_i,y) is the geodesic distance between x_i and y, ie the great circle distance between x_i and y. If a person has found two caches, at points on the opposite side of the earth from each other, both methods fail (assuming the earth is a sphere which it is not).

 

Yes, actually I have played around with this intrinsic mean as well. Its main disadvantage is that it is "hard" to calculate (it requires a calculation of N distances per iteration, where N is the number of caches). But it has the advantage of always giving an answer on the surface of the Earth.

 

One comment on your proposed mean: since distance on the surface is a metric, I am not convinced that minimizing the distance squared is correct. That gives you more of an RMS average than a mean, IMO. Just minimizing the sum of the distances is probably better.

 

There is a reason for using the squared distance. For s standard pdf function f(x) the point a which minimizes integral[(a-x)^2 f(x)] dx is the mean mu that you get from integral(x f(x)) dx. All integrals here are -inf to +inf. The discrete analog for a sample mean is the summation, and the (a-x)^2 discretizes to the squared distances.

 

We had a fast iterative method to give a computational approximation to this mean and the method worked for the infinite dimensional manifolds we dealt with.

 

Yeah, you are right. I am naturally skeptical of squared summations because they frequently involve an implicit assumption that f(x) is normal. I think it does here, also, but I can't prove it. Nonetheless, it is certainly better than minimizing the sum of the distances.

 

I typically use a more-or-less brute-force method to minimize functions using a simplex algorithm. It is stable and works for a variety of geometric problems, though it is not as fast as other solutions tuned to their problem space.

 

If I get time I will compare the centroids obtained in 3-space vs. those from the ellipsoid.

 

If you take integral[(a-x)^2 f(x)]dx and differentiate with respect to a, to minimize you get 2*integral[(a-x)f(x)]dx = 0 and solving for a gives a = integral(x f(x)) dx. This is true for any pdf f(x), so no assumption of normality is necessary. It seems like a simplex algorithm would work but the method we had was a gradient method. Is the comparison you're talking about a comparison between the implicit and explicit methods? We never did that comparison because we were concerned with other stuff.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...