Ratio vs. Raw Count

7.2k · January 20, 2011

Some more thoughts:

If Groundspeak didn't what the favorites to be a ranking system, then why allow us to sort on the number.

If you think raw counts versus ratios don't make much of a difference, then load up the Greasemonkey script and check for yourself. I'm seeing high ranked favorites with 2 and 3 percent favorites.

I've checked with the some of the caches I've found and so far I'm agreeing with the ratios more than the raw numbers.

Speculation is fine, but experimentation and confirmation seems to answer the question.

1.3k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Assume the most amazing cache in the world is visited by 5000 people. Only 500 of them are premium members. If every person that could add a favorite added a favorite the most you could come up with is 500 favorites, 10%. The math can go either way.

2.7k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Assume the most amazing cache in the world is visited by 5000 people. Only 500 of them are premium members. If every person that could add a favorite added a favorite the most you could come up with is 500 favorites, 10%. The math can go either way.

Yes you are right. But then again, this isn't about guarantees - it's about guidelines. In general, caches with higher ratios will be better. I can make up lots of extreme scenarios to drive my point one way or another. Statistics can be manipulated as you demonstrate. IF scenario AND scenario AND scenario ....

I'll be looking at the ratios, and the ones at the top I'll investigate more. Plain and simple.

20.7k · January 20, 2011

Now I'm pretty sure that you are missing the point on purpose.

Then educate me.

I did try to do that.

2.7k · January 20, 2011

Now I'm pretty sure that you are missing the point on purpose.

Then educate me.

I did try to do that.

You didn't succeed I guess. Explain your point better - or I suppose us two could head over to off-topic and stick to attacking each other instead of debating the points.

1.8k · January 20, 2011

Of course Markwell could have gone another step in his method. Weight each finder by the percentage of their favorites votes that they used.

I'm happy to stop where Markwell did. The issue with this proposed modification is that it gives people an incentive to not use their Favorites points. That way you can make your votes on the caches you loved the most count more than other people's votes. Maybe that's a good thing and maybe it's not (I'm inclined to like it, actually) - but the nice thing about Markwell's earlier modifications is that any effects on voting behavior would be a lot lower.

1.4k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

No. The claim is that is not accurate if you only look at the count. If you also look at the publish date it becomes more accurate. If you compare the favorite count of two old caches then it's a more meaningful comparison. But what if one has more visitors than the other because of the terrain? Replace the number of days active to number of finders and now you can compare dissimilar caches together: Frequently visited old, frequently visited new, less frequently visited old and less frequently visited new.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Faulty logic there. And accurate is a bad word for it. The count is accurate. It accurately tells me how many people liked it. It doesn't let me compare dissimilar things however.

WARNING! CAR ANALOGY! Two cars, one used 10 liters of gas and the other used 20 liters. Both numbers are accurate. But it doesn't tell me which is more efficient. For that I also need the distance driven. If the first drove 100 kilometers and the second drove 20 kilometers, the one that used 10 liters is more efficient.

Assume the most amazing cache in the world is visited by 5000 people. Only 500 of them are premium members. If every person that could add a favorite added a favorite the most you could come up with is 500 favorites, 10%. The math can go either way.

That only means something if you decide "I'm only going to visit caches that have 20% favourite or more". What most people will do is compare it to other caches in the area. The less amazing caches beside it would be less that 10%.

And that's why people want to have a different denominator: Number of non duplicate found logs by PMs that used at least on favourite point. It would let them compare caches over a wider geographical area.

Edited January 20, 2011 by Avernar

1.8k · January 20, 2011

It appears as though a lot of folks are pointing to nuances of a ratio system to prove how it would be useless to guarantee an enjoyable hunt.
...

"There are too many variables." That's what folks said about Found It log-type word counts... I was told by some that such a scheme was baseless.

This is almost precisely my experience. There are often some theoretical reasons why a system wouldn't give you 100% correlation to what you're looking for, and some carefully constructed counter-examples that might suggest to a *negative* correlation. And yet, whenever I've put the systems into place they seem to work and the counter-examples are dwarfed by the example-examples. I've found exactly what you've found about log length - whenever I filter and sort on log length, the results have been very helpful. It's theoretically possible for someone to write 4000 characters about how awful her experience was, but that certainly hasn't seemed to hurt the effectiveness of my filter much.

Same thing I've found with Favorites. Before they were introduced I felt like I read a number of claims that any rating system would be useless because everyone likes different things and people who like long hikes will get mucked up by people who love LPCs, etc. And yet... here I am finding it pretty useful so far.

1.8k · January 20, 2011

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

I think you may be confusing things. My issue isn't that the numbers are 'inaccurate', in the sense that they are being measured wrong. As far as I can tell the system is accurately recording the votes of cachers. The data itself isn't 'inaccurate'.

My issue is that the number is more useful to me if I compare these votes against the number of people who could have voted.

7k · January 20, 2011

Elementary statistics. Raw count tells you very little compared to the ratio. Why has been explained in excrutiating detail, repeatedly.

That only works when comparing apples to apples.

So a per-cache favorite count in relation to a constant number (1) is something completely different than a per-cache favorite count in relation to another per-cache number? Interesting.

Edited January 20, 2011 by dfx

7.2k · January 20, 2011

Some claim that the raw count is not accurate because of other factors.

The number of favorites in not "inaccurate," it is only the number of cachers who have put that cache on their favorite list. Nothing more.

Kind of like the find count. The find count is nothing more that the number of Found It logs someone has written. It only roughly correlates to number of caches someone has found.

Assume the most amazing cache in the world is visited by 5000 people. Only 500 of them are premium members. If every person that could add a favorite added a favorite the most you could come up with is 500 favorites, 10%. The math can go either way.

This would not invalidate the figure. You're discounting all caching is relatively localized. Folks don't travel to an area and find only one cache. They find a few to a lots depending on stamina and tolerance. Therefore, the 10% of finders being PMs would hold true to most of the caches in the area. This would produce a localized maximum favorite percentage of 10%. 100% of all eligible cachers like a certain cache would produce the 10% figure while lesser caches would produce a lessor figure. If roughly half of the eligible cachers favorite the cache the math would produce 5% ratio ranking that cache lower than the first cache.

It's only when you start comparing caches with different groups of cachers which might have a different group of PMs versus Non-PMs (or other reason the local population does or doesn't vote) that you will run into issues ranking caches.

Don't argue against the tree when you can't see the forest.

20.7k · January 20, 2011

Now I'm pretty sure that you are missing the point on purpose.

What do you think the point was?

If the ratio doesn't take account how long the cache has been published, the raw count doesn't even more. The caches that have hundreds of favourites are the older caches. Almost all of those points were for finds that happened before the favourite point system came into effect. That shows that people are not just restricting their favorites to finds that happened after the system went live which is the point the penguin is making.

What the ratio does suffer from is the ratio of PMs to Basic and the ratio of PM who assign points and those who don't. That's the reason why people want the ratio to be points vs PMs that are using their fave points. But this is only an issue if you're going outside your geographical area as those things will stay fairly constant locally.

As I understand it, the point was that there is an inherent skew to the data because many old top ten percent finds won't get 'faved' for various reasons. The fact that some old finds do get faved or that some old cachers kept a list of their faves does not change this fact. It should be mentioned that the skew works in both directions since these old cachers now have a surplus of fave votes to use on future caches.

Is this a huge problem? Probably not. That doesn't mean it doesn't exist, however.

7.2k · January 20, 2011

Counting only PMs in the ratio would be good so COs won't make their caches MOC simply to discount finds from Non-PMs to artificially inflate a Ration Favorite ranking.

20.7k · January 20, 2011

Now I'm pretty sure that you are missing the point on purpose.

Then educate me.

I did try to do that.

You didn't succeed I guess. Explain your point better - or I suppose us two could head over to off-topic and stick to attacking each other instead of debating the points.

Life is full of small failures. There's no point in repeating them.

2.7k · January 20, 2011

As I understand it, the point was that there is an inherent skew to the data because many old top ten percent finds won't get 'faved' for various reasons. The fact that some old finds do get faved or that some old cachers kept a list of their faves does not change this fact. It should be mentioned that the skew works in both directions since these old cachers now have a surplus of fave votes to use on future caches.

Is this a huge problem? Probably not. That doesn't mean it doesn't exist, however.

That's a valid viewpoint. Didn't think about the old timers with the pile-o-votes angle

1.3k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

No. The claim is that is not accurate if you only look at the count. If you also look at the publish date it becomes more accurate. If you compare the favorite count of two old caches then it's a more meaningful comparison. But what if one has more visitors than the other because of the terrain? Replace the number of days active to number of finders and now you can compare dissimilar caches together: Frequently visited old, frequently visited new, less frequently visited old and less frequently visited new.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Faulty logic there. And accurate is a bad word for it. The count is accurate. It accurately tells me how many people liked it. It doesn't let me compare dissimilar things however.

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

20.7k · January 20, 2011

Elementary statistics. Raw count tells you very little compared to the ratio. Why has been explained in excrutiating detail, repeatedly.

That only works when comparing apples to apples.

So a per-cache favorite count in relation to a constant number (1) is something completely different than a per-cache favorite count in relation to another per-cache number? Interesting.

It's the objects that make it impossible to compare the two, not the methodology. People are trying to compare tourist caches to long walk in the woods caches. They surmise that using a ratio will instead of raw data will make this comparison possible, but it won't. You still have to target the kinds of caches that you like and look at the fave data for only that subset.

7k · January 20, 2011

x = RAW
RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

So you're saying that things like IQ and BMI are completely meaningless and useless?

20.7k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

No. The claim is that is not accurate if you only look at the count. If you also look at the publish date it becomes more accurate. If you compare the favorite count of two old caches then it's a more meaningful comparison. But what if one has more visitors than the other because of the terrain? Replace the number of days active to number of finders and now you can compare dissimilar caches together: Frequently visited old, frequently visited new, less frequently visited old and less frequently visited new.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Faulty logic there. And accurate is a bad word for it. The count is accurate. It accurately tells me how many people liked it. It doesn't let me compare dissimilar things however.

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

I think that most would agree with your 'not trustworthy' statement, but disagree with your 'no good for finding caches' statement.

In my mind, faves works only so far as it helps increase the odds that a oft-faved cache will be enjoyed. It should not be seen as a guarantee of goodness or as an indicator that lesser-faved caches are bad.

Edited January 20, 2011 by sbell111

1.4k · January 20, 2011

As I understand it, the point was that there is an inherent skew to the data because many old top ten percent finds won't get 'faved' for various reasons.

I think there really is only one major reason: people don't want to be bothered going through their huge list of past finds to pick out the favourites. I believe that only the people with a large number of finds might not want to do that so it won't skew things too badly.

The fact that some old finds do get faved or that some old cachers kept a list of their faves does not change this fact. It should be mentioned that the skew works in both directions since these old cachers now have a surplus of fave votes to use on future caches.

You're assuming people want to use ALL their points. The fact that they didn't blow their entire 10% on past finds will just make it easier to find the true gems.

Is this a huge problem? Probably not. That doesn't mean it doesn't exist, however.

I don't think it's a problem at all. I believe you're overestimating how many cachers with lots of finds there are with the number of cachers with few finds.

7k · January 20, 2011

People are trying to compare tourist caches to long walk in the woods caches. They surmise that using a ratio will instead of raw data will make this comparison possible, but it won't.

What it will make possible is easily comparing the favorite counts between caches with different find counts, nothing more and nothing less. This is a very basic functionality and has nothing to do with the types of the caches involved.

1.3k · January 20, 2011

Some claim that the raw count is not accurate because of other factors. Some that claim this also claim that a ratio provides more accurate results.

No. The claim is that is not accurate if you only look at the count. If you also look at the publish date it becomes more accurate. If you compare the favorite count of two old caches then it's a more meaningful comparison. But what if one has more visitors than the other because of the terrain? Replace the number of days active to number of finders and now you can compare dissimilar caches together: Frequently visited old, frequently visited new, less frequently visited old and less frequently visited new.

I put forth that if the raw count is not accurate then a ratio based on the raw count is no more accurate as it is based in inaccurate data.

Faulty logic there. And accurate is a bad word for it. The count is accurate. It accurately tells me how many people liked it. It doesn't let me compare dissimilar things however.

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

I think that most would agree with your 'not trustworthy' statement, but disagree with your 'no good for finding caches' statement.

In my mind, faves works only so far as it helps increase the odds that a oft-faved cache is awill be enjoyed. It should not be seen as a guarantee of goodness or as an indicator that lesser-faved caches are bad.

I think the RAW is useful. I am just pointing out the bending the numbers does not guarantee the correct result if one believes the raw does not accurately give the correct results.

1.3k · January 20, 2011

x = RAW
RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

So you're saying that things like IQ and BMI are completely meaningless and useless?

I have a high value to both of those. One gets me into trouble now and the other will get me into trouble later. I'll let you guess which is which.

As a side note and not wanting to go too far off topic, there are those that believe that IQ is an unfair test for reason i will not get into here.

1.4k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

You completely missed my point.

RAW = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

x = RAW

x = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

y = number of finders

z = x/y

z = good for finding caches regardless of the class

RATIO = z

RATIO = good for finding caches regardless of the class

Edited January 20, 2011 by Avernar

1.3k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

You completely missed my point.

RAW = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

x = RAW

x = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

y = number of finders

z = x/y

z = good for finding caches regardless of the class

RATIO = z

RATIO = good for finding caches regardless of the class

Can you show your work for how "within" got dropped?

1.8k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

But turning a raw number into a ratio can be precisely *what* makes it useful.

To borrow an earlier analogy, total gallons of gas used might not be very useful for determining fuel efficiency, but dividing by miles traveled helps a lot.

1.3k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

But turning a raw number into a ratio can be precisely *what* makes it useful.

To borrow an earlier analogy, total gallons of gas used might not be very useful for determining fuel efficiency, but dividing by miles traveled helps a lot.

How does it make it useful, I don't see the math for that?

1.8k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

But turning a raw number into a ratio can be precisely *what* makes it useful.

To borrow an earlier analogy, total gallons of gas used might not be very useful for determining fuel efficiency, but dividing by miles traveled helps a lot.

How does it make it useful, I don't see the math for that?

Here might be one example. I decide I want to save money on gas. Our family has two cars - in one I put 50 gallons of gas last year, and in the other I put 20 gallons of gas.

In my opinion, I don't really have as much information as I'd like. It's not entirely clear to me that driving the second car will save me money.

So I divide by miles traveled. And I find that I drove 1000 miles in the first car last year, and 200 miles in the second car. Now I see that the first car offers me about 20mpg and the second offers me about 10mpg, and I believe with increased confidence that driving the first car more is probably going to save me money.

I'm still not *entirely* sure about that - maybe I only ever drive the first car on the highway at 55mph and I only drive the second car in rush hour stop-and-go city traffic? So I'll continue to proceed with caution. But I certainly feel better about the comparison, than I did when all I knew was the total gallons of gas.

1.4k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

You completely missed my point.

RAW = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

x = RAW

x = good for finding caches within a class (old vs new, lots of visitors vs few visitors)

y = number of finders

z = x/y

z = good for finding caches regardless of the class

RATIO = z

RATIO = good for finding caches regardless of the class

Can you show your work for how "within" got dropped?

Didn't know this was a math exam but I'll bite:

The class corelates to the number of finders a cache has. The reason I used class is most people won't be doing comparisons using exact numbers for this but just lump caches with similar find counts together.

So "within a class" = "with the same number of finders"

RAW = good for finding caches with the same number of finders

x = RAW

x = good for finding caches with the same number of finders

y = number of finders

z = x/y

z = good for finding caches regardless of the number of finders

RATIO = z

RATIO = good for finding caches regardless of the number of finders

The ratio lets you compare caches with a different number of finders. If you've factored that out manually (i.e. comparing two old virtuals in the same area) then going by the raw count works just as well.

As I've mentioned in a previous post, the raw count is important as it tells you how statistically meaningful the ratio is. The ratio for a cache with 2 favorites and 100% ratio is less meaningful than the ratio for a cache with 100 favourites and 50% ratio.

8k · January 20, 2011

That's why it makes more sense to use the ratio than the raw count, because it's more meaningful. It's not perfect, but better.

Nobody has given a convincing argument that ratio is more meaningful than a raw count. I understant that intuitively if 100% of the people who find a cache liked it, than you'd be more likely to like it than you would a cache that only 10% of the finders liked. But there has been no rigorous argument to show this is true. Since favorites are only tell you that person thinks the cache in in their top 10%, if 10% favorite a cache you don't know what the other 90% thought. I contend that if a substantial number of individuals favorite a cache it is probably a good cache. The other 90% percent are likely people who liked the cache - just not enough for it to be in their top 10%. My concern with using ratio without more infomation is that you can have a cache that appeals to a small number of cachers. These caches don't every get found by the general geocaching community for whom that sort of cache is not appealing. My argument is that these caches will have a higher ration of favorites to finders, but unless you are in the group that this cache appeal to, you are not likely to enjoy the cache.

After filtering, I'd probably prefer searching for an out-of-the-way 1.5/1.5 cache that received 20 points from 40 finders (50%) rather than a 1.5/1.5 cache beside a tourist attraction that received 30 points from 300 finders (10%).

Neither raw numbers nor percentages will be perfect predictors of my likelihood of enjoying a cache. Using filters is a good first step to improving Favorites' predictive abilities. Using percentages is a good second step.

I'm not sure I would even make a difference between a cache that got 20 favorites and one that got 30 favorites. These are both high numbers.

Perhaps that's because you're looking at raw numbers rather than ratios. To you, there's not much difference between 20 and 30. To me, there's a significant difference between 50% and 10%. I suspect I'm more likely to enjoy the out-of-the way cache.

But let's look at it your way. Assume the tourist attraction received 50 points from 500 finders (still 10%) and the out-of-the way cache received 10 points from 20 finders (still 50%). Now, do you still think they are both about equally attractive to you?

It's certainly clear that some caches get found exceptionally often. These are often caches that happen to be in areas that get lots of tourist visits. These caches are generally easy to recognize. In fact that's one reason they get found so often. People are visiting some tourist location they check to see if there is a cache there and as they're going to be there anyway, they find the cache. I'd probably find that cache if I was at the tourist location regardless of the actual number of favorites or the ratio. Of course if 50 people say it's among their favorite caches, I'd expect that at least I won't be thinking "what a Mickey Mouse cache sbell111 has here".

I'd probably look at the cache pages to decide which of these I'd look for if I had to make a choice. At that time I would see the 30 point cache was virtual in a touristy area and the 20 point one was a traditional in an out-of-the-way area with long logs from some of the finders. I can decide which of these I want to do. The percentages don't help at all.

Yes, I think everyone agrees that gathering even more data improves your choices even more. But which caches are you going to look for more data about? The ones with higher raw points or the ones with higher percentages? That's where percentages would be helpful to me.

Unfortunately, neither raw count or ratio by itself is a good estimator for whether you will enjoy the cache. In either case you will have to look for more information if you don't want to go to the cache and be disappointed whey you discover you need SCUBA equipment to find it. I'm still of the opinion that a cache with a high ratio is more likely to be one of these caches I will have to eliminate when I look at other information, and that high raw favorite counts are a better predictor.

I think that you may also be trying to find not just caches that you would enjoy, but which you would enjoy most. I think this is asking too much of the favorites system. Perhaps if you could see how many favorites were from people who also favorited many of the cache on your favorites list, you get an idea as to whether this is a cache that would be on your favorite list as well.

On the other hand if I see a 2/4.5 cache with 2 favorites out of 4 finders (50%) and a 1.5/1.5 cache with 30 points from 300 finders (10%), I probably make my decision by whether I wanted to go on a hike or not.

If you filter first, as DanOCan suggested, then you're not comparing a 2/4.5 cache with a 1.5/1.5 cache. By filtering with additional data like D/T ratings and attributes, you reduce external influences like whether you want to go a hike or enjoy SCUBA diving. Filters can help you compare hiking caches with hiking caches, SCUBA caches with SCUBA caches, night caches with night caches, etc.

Favorite points aren't perfect indicators of a cache's potential enjoyment. They are only one of several factors at your disposal. And favorite percentages are another. Use percentages or don't. Use D/T and attributes or don't. It's up to you.

My fear is that people are looking for the "easy-peasy" method of finding caches they will enjoy. Understand what the favorites mean is critical to knowing how to use it. A raw count means that that many geoachers put it on their favorite list. A ratio means "Gee did they divide by the total finders or only by the premium member finders; was this cache found by only 2 people or by 200; and is this really some cache where the ratio will help me decide?" It's really hard to figure out what it means.

1.3k · January 20, 2011

I think i will have to go with the RAW count. I still don't understand how using a ratio will give anything other than bending the to come up with a different number. Far too much work when i can just download a PQ with caches I haven't found and go find them. This has seemed to work for a few years now.

January 20, 2011

Can I favorite my own puzzle cache that has not been found since 2007? The puzzle is not that difficult, but only 2 finders (together). I'm also thinking of making an archived event cache a favorite just for the pictures of the CO after a polar bear swim.

I still have over 80 favorite points to award. I will try to distribute them among memorable caches I have found over the last 5 years. I hope that if I favorite a cache that has been found less than 5 times it will get visited more.

20.7k · January 20, 2011

As I understand it, the point was that there is an inherent skew to the data because many old top ten percent finds won't get 'faved' for various reasons.

I think there really is only one major reason: people don't want to be bothered going through their huge list of past finds to pick out the favourites. I believe that only the people with a large number of finds might not want to do that so it won't skew things too badly.

I think that you may be defining the word 'large' in a strange way. Someone with one or two hundred old finds may not wish to slog through every one of them to figure out which ones were the best of the best.

2.7k · January 20, 2011

Can I favorite my own puzzle cache that has not been found since 2007? The puzzle is not that difficult, but only 2 finders (together). I'm also thinking of making an archived event cache a favorite just for the pictures of the CO after a polar bear swim.

I still have over 80 favorite points to award. I will try to distribute them among memorable caches I have found over the last 5 years. I hope that if I favorite a cache that has been found less than 5 times it will get visited more.

No, you cannot favorite your own cache.

1.8k · January 20, 2011

Can I favorite my own puzzle cache that has not been found since 2007? The puzzle is not that difficult, but only 2 finders (together). I'm also thinking of making an archived event cache a favorite just for the pictures of the CO after a polar bear swim.

You've hit on two of the types of caches that you cannot vote for - your own listings, and events.

8k · January 20, 2011

Replace accurate with accurate in finding the best caches.

x = RAW

RAW = no good for finding caches and therefore not trustworthy

x = no good for finding caches and therefore not trustworthy

y = number of finders

z = x/y

z = no good for finding caches and therefore not trustworthy

I think your math is a little oversimplified.

What you can do is define the Likelihood fuction for x:

L(x, enjoy) = likelihood that Raw >= x, given this a cache I would enjoy.

Then compute the Fisher Information by taking the second partial derivative of the natural logarithm of L with respect to x)

Then do the same for z=x/y

L(z,enjoy) = likelihood that Ratio>=z given this is a cache I would enjoy.

You might find one has more information than the other.

-------

It's clear that number of finders has some effect on the number of favorites. There seems to be a desire to remove this effect (whether or not it really matters). The ratio looks as if it normalizes for number of finds.

However it is just as clear that number of finds itself is a statistic that can be used to predict if someone will like a cache. People who enjoy P&Gs are more likely to enjoy caches with many finders. People who enjoy especially challenging caches (tough puzzles, difficult hikes, etc.) are more likely to enjoy caches with few finds. FTF enthusiasts are more likely to enjoy caches with zero finds.

With a ratio you are combining two statistics that each provide some information about whether you would enjoy finding a cache. What's worse is that more finds may may increase the likelihood of enjoyment for some cachers and decrease it for other. For some cachers the effect might not even be monotonic. (For example, cachers that like caches that get found regularly but not ones in touristy areas that get thousands of finds. Or cachers that like caches that get found rarely and also like really old caches that have been found many times). So using a ratio loses information. Really the only thing you can do is look a the raw count and the number of finds (along with all the other statisics that might effect your enjoyment of the cache).

Edited January 20, 2011 by tozainamboku

1.4k · January 20, 2011

Nobody has given a convincing argument that ratio is more meaningful than a raw count.

Depends on your definition of meaningful. If you want to find more popular, frequently visited caches that are easy to get to then the raw count is quite meaningful.

Here's another set of stats:

A) Old cache with 1000 finds and 100 favorites, 10% ratio

Old cache with 100 finds and 50 favorites, 50% ratio

C) New cache with 20 finds and 8 favorites, and 40% ratio

D) New cache with 5 finds and 3 favorites, 60% ratio

Going by raw numbers the old frequently visited cache comes out ahead. Going by the ratios the other 3 are fairly close and beat out A quite a bit. Personally I'd look at B over D first as the 5 finds is a low sampling size or read the logs of D to figure out why it's liked so much.

It factors out cache age and visit frequency out of the "equation" and thus expands the pool of caches to choose from. If you don't care about that, by all means use the raw count.

Edited January 20, 2011 by Avernar

2.1k · January 20, 2011

So you're saying that things like IQ and BMI are completely meaningless and useless?

There are lots of arguments about IQ. BMI is almost totally useless. Unless you think that Tom Cruise is seriously overweight, or Arnold Schwarzenneger and Dwayne "The Rock" Johnson are morbidly obese.

4.2k · January 20, 2011

You know Netflix has been working on something similar for years now and they're using all sorts of fancy, big-brained algorithms. Something tells me that most of the math I'm seeing in this thread is only scratching the surface. Possibly only the upper atmosphere above the surface.

1.3k · January 20, 2011

What i would like to see is ratio's applied to real caches and see what caches come up as the must do caches. I notice that some of the sample math equate older caches as having higher finds and tourist locations having higher finds. I see a lot of numbers being fed into equations that happen to support what the poster is trying to convince others.

4.2k · January 20, 2011

So you're saying that things like IQ and BMI are completely meaningless and useless?

There are lots of arguments about IQ. BMI is almost totally useless. Unless you think that Tom Cruise is seriously overweight, or Arnold Schwarzenneger and Dwayne "The Rock" Johnson are morbidly obese.

Tom is the perfect weight on his home planet. :wacko:

1.8k · January 20, 2011

Nobody has given a convincing argument that ratio is more meaningful than a raw count. I understant that intuitively if 100% of the people who find a cache liked it, than you'd be more likely to like it than you would a cache that only 10% of the finders liked. But there has been no rigorous argument to show this is true.

Does there need to be rigorous argument to prove this is true in some some universal way? A number of people share the intuition that it would be useful for them, and have outlined a number of reasons why. They seem like reasonable lines of thought, even if they aren't up to the standards of a mathematical proof. (FWIW, I haven't seen anything that rigorously proves that raw totals are superior either.)

I can't prove rigorously that sorting on long Found It logs will lead me to discover great caches. And yet, like Coyote Red, it has been a great tool for me. It might fail at the margins, or otherwise be something that shouldn't be used exclusive of all other data, and people can be very genuine in their insistence that I'm a fool for doing it. But I still like it.

2.7k · January 20, 2011

What i would like to see is ratio's applied to real caches and see what caches come up as the must do caches. I notice that some of the sample math equate older caches as having higher finds and tourist locations having higher finds. I see a lot of numbers being fed into equations that happen to support what the poster is trying to convince others.

So would I. That's kinda what the ratio crowd are asking for though - the ability to quickly look this information up (for example, sort the list of caches from a Province or PQ result). Right now I'd have to manually go through the 21000 caches in Ontario, and run a greasemonkey script on each to see the ratio for each before I can compare them .

1.8k · January 20, 2011

What i would like to see is ratio's applied to real caches and see what caches come up as the must do caches.

You and me both.

1.3k · January 20, 2011

So you're saying that things like IQ and BMI are completely meaningless and useless?

There are lots of arguments about IQ. BMI is almost totally useless. Unless you think that Tom Cruise is seriously overweight, or Arnold Schwarzenneger and Dwayne "The Rock" Johnson are morbidly obese.

Tom is the perfect weight on his home planet.

I am my ideal weight if I was seven feet tall.

1.4k · January 20, 2011

What i would like to see is ratio's applied to real caches and see what caches come up as the must do caches.

So would I. That's why I wish I could sort on the ratio as well as the raw. For now I have to go into each cache page to see its ratio.

I notice that some of the sample math equate older caches as having higher finds and tourist locations having higher finds.

The math will work with other combinations. Nothing stops an older cache with a low find count from getting either a high or low ratio. If 4.5lb Walleye ever gets a 1 favourite (100% ratio) I'll definitely want to see why. That would get lost on a raw count sort several pages down.

I see a lot of numbers being fed into equations that happen to support what the poster is trying to convince others.

Again, the numbers don't matter. Feed in whatever numbers you want. Here's another set:

A) Old cache with 1000 finds and 500 favorites, 50% ratio

Old cache with 100 finds and 40 favorites, 40% ratio

C) New cache with 20 finds and 5 favorites, and 25% ratio

D) New cache with 5 finds and 1 favorites, 20% ratio

Now the order is the same if you go raw or ratio. My point is that raw works for some numbers, ratio works for all numbers (again, as long as finding popular frequently caches is not your goal).

Edited January 20, 2011 by Avernar

January 20, 2011

So using a ratio loses information. Really the only thing you can do is look a the raw count and the number of finds.

Or look at the ratio and the number of finds to recompute the raw count. No information lost. Please don't come back with "Fisher Information" (even if used correctly next time) as it doesn't help the debate.

What i would like to see is ratio's applied to real caches and see what caches come up as the must do caches.
You and me both.

+1 ... logged and double-differentiated!

1.3k · January 20, 2011

I pretty sure I was misunderstood. I don't want to see it implemented to work on real caches. I want to see examples of the new math applied to real caches.

A) Old cache with 1000 finds and 500 favorites, 50% ratio

Old cache with 100 finds and 40 favorites, 40% ratio

C) New cache with 20 finds and 5 favorites, and 25% ratio

D) New cache with 5 finds and 1 favorites, 20% ratio

This means nothing other than more favorites means better cache.

Example 1

Old cache with 1,000 find and 100 favorites = 10% ratio.

A few days later previous finders add favorites.

Old cache with 1,000 find and 200 favorites = 20% ratio.

Example 2

Old cache with 100 favorites = 100 favorites.

A few days later previous finders add favorites.

Old cache with 200 favorites = 200 favorites.

In both cases the cache looks better after more favorites have been added. The cache has not changed.

1.8k · January 20, 2011

I want to see examples of the new math applied to real caches.

I have a couple of real caches in a nearby park. They each have about the same number of Favorites votes.

One has been found many, many times. It has about a 2% Favorite rating and last I checked about a 3.3 GCVote score.

The other has been found a lot fewer times. It has about a 25% Favorite rating and last I checked a 5.0 GCVote score.

I love both caches dearly. As raw Favorites go, they are equals. But I think they are very different in terms of overall experience (and time commitment, frankly). It's far from the only way to differentiate them, but I think the ratio number could be helpful, for some people.

5.3k · January 20, 2011

I personally think that using favorites to rank caches in hopes of finding the "best" cache is not a valid use of the system.

So you are now telling us what is a legitimate ("valid") use of the favorites feature and what is not?

Wow. Talk about trying to control how other people enjoy the game! Now there are to be un-allowed ways to even think about things.

Using favorites to identify likely good caches is not the same as ranking the caches. The ratio is a better measure of how much people liked the cache, and thus a better tool for finding the best caches. It still has some weaknesses: most notably, older caches that have a lot of no-longer-active finders will have lower ratios. But I find it more useful than the raw count.

And I don't much like being told how I am supposed to use the feature, thank you very much.

Ratio vs. Raw Count

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment