Ratio vs. Raw Count

1.4k · January 20, 2011

Example 1

Old cache with 1,000 find and 100 favorites = 10% ratio.

A few days later previous finders add favorites.

Old cache with 1,000 find and 200 favorites = 20% ratio.

I can compare it to another cache I'm considering that has 300 finds and 150 favourites (50% ratio). That change in favourite count hasn't made it more attractive.

Example 2

Old cache with 100 favorites = 100 favorites.

A few days later previous finders add favorites.

Old cache with 200 favorites = 200 favorites.

In both cases the cache looks better after more favorites have been added. The cache has not changed.

I can't compare it to another cache I'm considering that has 300 finds and 150 favourites (50% ratio). On the surface, 200 is bigger than 150. If this cache has 1000 finders like in example 1 then its 20% makes it look like the poorer choice. If this cache has 250 finds then its 80% ratio makes it the probable better choice.

Yes, that cache has not changed but I'm not just looking at that one cache in isolation. My selection method is not "a cache must have X number of favourites for me to consider it". I compare caches to each other. The ratio lets me compare caches, raw count alone does not.

Edited January 20, 2011 by Avernar

5.3k · January 20, 2011

I think your math is a little oversimplified.
What you can do is define the Likelihood fuction for x:

L(x, enjoy) = likelihood that Raw >= x, given this a cache I would enjoy.

Then compute the Fisher Information by taking the second partial derivative of the natural logarithm of L with respect to x)

Then do the same for z=x/y

L(z,enjoy) = likelihood that Ratio>=z given this is a cache I would enjoy.

My opinion (and it's only an opinion) is that you should understand statistics pretty well before using them in a public forum. The probability that you will look very dumb for using them incorrectly is high.

Your argument is not correct, even though the math is right. The problem is that the math has been applied improperly. That's by far the most common problem in how people use statistics. Look up "sufficient statistic" and see how it applies to this situation.

The information content of number of favorites and number of finders is identical to the information content of ratio of favorites and number of finders.

7k · January 20, 2011

BMI is almost totally useless. Unless you think that Tom Cruise is seriously overweight, or Arnold Schwarzenneger and Dwayne "The Rock" Johnson are morbidly obese.

But it's still better than giving just the weight, right?

18.9k · January 20, 2011

...

And I don't much like being told how I am supposed to use the feature, thank you very much.

Ditto!!

8k · January 20, 2011

I personally think that using favorites to rank caches in hopes of finding the "best" cache is not a valid use of the system.

So you are now telling us what is a legitimate ("valid") use of the favorites feature and what is not?

Wow. Talk about trying to control how other people enjoy the game! Now there are to be un-allowed ways to even think about things.

Using favorites to identify likely good caches is not the same as ranking the caches. The ratio is a better measure of how much people liked the cache, and thus a better tool for finding the best caches. It still has some weaknesses: most notably, older caches that have a lot of no-longer-active finders will have lower ratios. But I find it more useful than the raw count.

And I don't much like being told how I am supposed to use the feature, thank you very much.

Probably valid was not the correct word. But in any case I have no objection to using favorites to identify likely favorite caches. I have no doubt that some people may in fact find that the ratio is a better predictor than the raw count. Until we see some numbers with real caches, I will tend to believe that caches that are found rarely because they appeal to a limited group of cachers will have high ratio. So if you only look at ratio you will be forced to look at caches that might not interest you. While there are ways to filter these out before looking at favorites, it hasn't been made clear that this needs to be done if you use ratio. Caches with high raw counts are going to be ones that most cachers and do (and would be interested in doing). I contend however that the ranking of these caches given by ratio is no better than the ranking by raw count. In either case, I believe, most cachers will empirically find a threshold and look at all the caches that exceed the threshold. Again, lacking real data, my impression is that a threshold on raw count will give you fewer false hits than the threshold on ratio. However, I understand that intuitively some cachers will believe the opposite is true.

So using a ratio loses information. Really the only thing you can do is look a the raw count and the number of finds.
Or look at the ratio and the number of finds to recompute the raw count. No information lost. Please don't come back with "Fisher Information" (even if used correctly next time) as it doesn't help the debate.

Since both the raw counts and the ratio will be given, I'm not hurt by it. But since the number of finds is already known I don't see the reason. I understand why intuitively the ratio seems to remove the bias in raw numbers that favors often found caches. But I'm not going to rank caches and say that one with 50 favorites is better than one with 10 regardless of how many times each was found. What I'm likely going to do is apply a threshold. Any cache with more than some number of favorites (after doing filtering on D/T, cache type, and other attributes), is likely to be on I would enjoy. Until we see some real numbers, I will admit that I have no proof that ratios will not work as well. However, it is not hard to create examples where the ratio fluctuates wildly with a few new finders (who either favorite or not). The greater the variance in the statistic (the quality of the cache hasn't changed but the ratio has) the less this number tells me about quality. For a given cache the raw count moves up (and sometimes down) in small increments and since I'm looking at threshold the predicted quality changes only when that threshold is crossed not just because a bunch of finder has changed the ratio. It just seem intuitive to me that a cache with 10 favorites mean 10 cachers liked this cache and that's probably enough for me to like it to. If I am looking for tough puzzles or long hikes, the threshold I would use would be less than 10.

9.4k · January 20, 2011

Using favorites to identify likely good caches is not the same as ranking the caches. The ratio is a better measure of how much people liked the cache, and thus a better tool for finding the best caches. It still has some weaknesses: most notably, older caches that have a lot of no-longer-active finders will have lower ratios. But I find it more useful than the raw count.

There's a really good point here. I haven't gone back through my finds to award favorites; I'm awarding them as I find new favorite caches. I think many of the caches I would have marked as a favorite 5 years ago have been archived anyway. The problem is that good "older" caches will end up with fewer votes than good "newer" caches, especially by those of us who had pretty much cached out our home area.

8k · January 20, 2011

I think your math is a little oversimplified.
What you can do is define the Likelihood fuction for x:

L(x, enjoy) = likelihood that Raw >= x, given this a cache I would enjoy.

Then compute the Fisher Information by taking the second partial derivative of the natural logarithm of L with respect to x)

Then do the same for z=x/y

L(z,enjoy) = likelihood that Ratio>=z given this is a cache I would enjoy.

My opinion (and it's only an opinion) is that you should understand statistics pretty well before using them in a public forum. The probability that you will look very dumb for using them incorrectly is high.

Your argument is not correct, even though the math is right. The problem is that the math has been applied improperly. That's by far the most common problem in how people use statistics. Look up "sufficient statistic" and see how it applies to this situation.

The information content of number of favorites and number of finders is identical to the information content of ratio of favorites and number of finders.

I admit I was being silly in that response. Of course if you know any two of number of finders, number of favorites, and the ratio of favorites to the number of finders; you can compute the other. No information is lost. Also without some empirical data I can say nothing about whether the raw count or the ratio is the better predictor of whether I would enjoy the cache. I have my opinions and I can construct examples that show caches where the raw count would be better, but it's possible these will be rare (or easily detectable) in the real world.

1.4k · January 20, 2011

Again, lacking real data, my impression is that a threshold on raw count will give you fewer false hits than the threshold on ratio. However, I understand that intuitively some cachers will believe the opposite is true.

I agree with you that a threshold on raw count will give fewer false hits than a threshold on ratio. But that's not what I'm arguing. I'm saying a comparison using ratios will give fewer false hits than a threshold on raw count. A threshold on the raw count is still needed to see how much you should "trust" the ratio.

Since both the raw counts and the ratio will be given, I'm not hurt by it. But since the number of finds is already known I don't see the reason.

The number of finds is not known on the PQ preview page. We can't calculate the ratio or sort by it there.

1.4k · January 20, 2011

Of course if you know any two of number of finders, number of favorites, and the ratio of favorites to the number of finders; you can compute the other.

As I wrote in the previous post we don't know two of the three. We just have the number of favourites on the PQ preview.

On the cache page we have number of favourites and number of finders so the greasmonkey script can calculate the ratio there.

Also without some empirical data I can say nothing about whether the raw count or the ratio is the better predictor of whether I would enjoy the cache.

So far the ratio seems to be working better for me. We'll see how that goes as I find more caches.

1.3k · January 20, 2011

I can compare it to another cache I'm considering that has 300 finds and 150 favourites (50% ratio).

I would like to see 20 caches put into your formula and see what comes out.

Out of interest, it looks like people have been playing with the numbers, so now RAW data and any data derived from the RAW will be less reliable.

#1 cache by asking for votes.

8k · January 20, 2011

Again, lacking real data, my impression is that a threshold on raw count will give you fewer false hits than the threshold on ratio. However, I understand that intuitively some cachers will believe the opposite is true.

I agree with you that a threshold on raw count will give fewer false hits than a threshold on ratio. But that's not what I'm arguing. I'm saying a comparison using ratios will give fewer false hits than a threshold on raw count. A threshold on the raw count is still needed to see how much you should "trust" the ratio.

Since both the raw counts and the ratio will be given, I'm not hurt by it. But since the number of finds is already known I don't see the reason.

The number of finds is not known on the PQ preview page. We can't calculate the ratio or sort by it there.

Well I need to be careful, since fizzymagic told me I should not tell people what is a valid use of favorites.

Let's just say I'm disturbed that someone wants to use favorites to compare two cache. This is one of the reasons I always objected to cache rating/ranking system. I liked the favorites idea because it seems useful for trying find caches you are likely going to enjoy (an perhaps to filter out caches you don't care for). For example, Coyote Red has indicated that now instead of filtering out all 1/1 micros because so many are "trache", he can run a query for just 1/1 micros that have a threshold favorite count (or ratio). A common complaint before was that you were forced to ignore certain caches in order to avoid others that you thought were lame.

I don't believe the number of favorites (whether or not adjusted for number of finds or age of cache) gives a ranking of "bestness" that is useable for anything but entertainment. A favorite vote means that a cache is in a cachers top 10%, not that is his favorite cache. It is easy to imagine a situations where two cachers have different "best" caches. But there is some cache well down on both their favorites list. That cache will have two votes while the two "best" caches have 1 vote each. Or take the case where a cacher's favorite cache is one that has been found by 100 cachers and is on ten favorites list. Well down on our cacher's favorites list is a cache that only he has found. That cache get 100% ratio, but the cacher's real favorite has only 10%. Do you think he will believe that ratio is better for comparing caches?

1.8k · January 20, 2011

It is easy to imagine a situations where two cachers have different "best" caches. But there is some cache well down on both their favorites list. That cache will have two votes while the two "best" caches have 1 vote each. Or take the case where a cacher's favorite cache is one that has been found by 100 cachers and is on ten favorites list. Well down on our cacher's favorites list is a cache that only he has found. That cache get 100% ratio, but the cacher's real favorite has only 10%. Do you think he will believe that ratio is better for comparing caches?

Scenarios can almost always be constructed at the margins where neither system works perfectly well. But I maintain that the metric will be pretty effective when applied to the meat of the distribution.

I'm not looking for 100% predictive power in every situation. I just want a useful tool in a lot of situations.

Don't let the perfect be the enemy of the pretty good.

5.3k · January 21, 2011

Let's just say I'm disturbed that someone wants to use favorites to compare two cache. This is one of the reasons I always objected to cache rating/ranking system. I liked the favorites idea because it seems useful for trying find caches you are likely going to enjoy (an perhaps to filter out caches you don't care for).

Choosing caches you are likely to enjoy requires comparing caches, by definition. Right? I don't get your concern. Are you worried that people will say that some caches are "better" than others in some absolute sense, and that it will engender some competition for "better" caches?

I don't believe the number of favorites (whether or not adjusted for number of finds or age of cache) gives a ranking of "bestness" that is useable for anything but entertainment.

And I think you are wrong. I have already used the favorites system to find 5 or 6 really good local caches that I would have ignored otherwise. I used the number of favorites to identify these caches. Thus, I have already proven to myself that the number of favorites is useful for more than entertainment.

Maybe it's not useful for you, but it is useful for me.

I suppose that there is some competitive aspect to what I am doing; if you think of caches as competing for my attention, then the caches with more favorites are "winning." I see that as a feature, not a problem. I cannot understand why you find my method so "disturbing."

1.4k · January 21, 2011

I would like to see 20 caches put into your formula and see what comes out.

Do you run Firefox and have the Greasemonkey Add-On? If so, use this script to display the percentage: Geocaching Favorites Percentage

Maybe the percentage will be useful, maybe it won't. It's your decision if that's useful to you or not.

Out of interest, it looks like people have been playing with the numbers, so now RAW data and any data derived from the RAW will be less reliable.

#1 cache by asking for votes.

That was bound to happen. No different than people putting wrong attributes or having other incorrect information on their cache page. The system is not perfect but works good enough that it's useful.

1.4k · January 21, 2011

Let's just say I'm disturbed that someone wants to use favorites to compare two cache.

I'm disturbed that someone wants to use cache size to compare two caches! Micros deserve to be found too! :anibad:

As fizzymagic said, finding better caches kind of requires me to compare caches. Does it matter what information I use? D/T, size, attributes, CO, fafourites, etc. They're all pieces of information that I will use to figure out which cache I want to do next when I'm in a mood to do something interesting.

This is one of the reasons I always objected to cache rating/ranking system. I liked the favorites idea because it seems useful for trying find caches you are likely going to enjoy (an perhaps to filter out caches you don't care for). For example, Coyote Red has indicated that now instead of filtering out all 1/1 micros because so many are "trache", he can run a query for just 1/1 micros that have a threshold favorite count (or ratio). A common complaint before was that you were forced to ignore certain caches in order to avoid others that you thought were lame.

Uh, aren't you contradicting yourself here? Your Coyote Red example is comparing caches. It's a very black and white comparison as there's only two bins (above threshold and below threshold) but still a comparison.

I don't believe the number of favorites (whether or not adjusted for number of finds or age of cache) gives a ranking of "bestness" that is useable for anything but entertainment.

Here I can say you're wrong as I've used it to pick some good caches that I may missed otherwise.

A favorite vote means that a cache is in a cachers top 10%, not that is his favorite cache. It is easy to imagine a situations where two cachers have different "best" caches. But there is some cache well down on both their favorites list. That cache will have two votes while the two "best" caches have 1 vote each.

I'm not trying to find the top cache on each persons favourite list. I'm use statistics to maximize my chances that I'll find an above average cache and hopefully an exceptional cache.

Or take the case where a cacher's favorite cache is one that has been found by 100 cachers and is on ten favorites list. Well down on our cacher's favorites list is a cache that only he has found. That cache get 100% ratio, but the cacher's real favorite has only 10%. Do you think he will believe that ratio is better for comparing caches?

Fizzymagic is right, I think you need to brush up on your statistics. Go read the chapter on Sample Size again. :laughing:

Larger sample size, more precision. A low raw count means it goes further down on my list even though the ratio may be high. I implicitly said this in post #2 and explicitly in post #87.

Edited January 21, 2011 by Avernar

22k · January 21, 2011

I'm just trying to figure out what caches I can search for based on a current situation on any given day. Knowing what's popular doesn't help me much.

If say I'm taking a trip to Rome I'm going to ask a local what are the best caches to search for when I have a one year old with a stroller, or when I have a rental car and unlimited time, or when I have just a couple hours to walk the area.

Favorites don't seem to be much more than statistical information.

So, have at it.

1.4k · January 21, 2011

If say I'm taking a trip to Rome I'm going to ask a local what are the best caches to search for when I have a one year old with a stroller, or when I have a rental car and unlimited time, or when I have just a couple hours to walk the area.

What if the local gave you two suggestions but you later realize you have only time for one. The favourite points might be a good tie breaker.

22k · January 21, 2011

If say I'm taking a trip to Rome I'm going to ask a local what are the best caches to search for when I have a one year old with a stroller, or when I have a rental car and unlimited time, or when I have just a couple hours to walk the area.

What if the local gave you two suggestions but you later realize you have only time for one. The favourite points might be a good tie breaker.

If my PQ filter results could be ranked by favorites there might be some benefit. I think we haven't defined exactly what we expect to get from cache rankings. Popular might be the most accurate but not much help day to day.

Logging the popular caches seems to be more of a long range goal to me.

1.4k · January 21, 2011

If my PQ filter results could be ranked by favorites there might be some benefit.

Online, yes. Preview the PQ as a list and then sort on the favourite count column.

Offline, no. But hopefully soon.

38.2k · January 21, 2011

Using favorites to identify likely good caches is not the same as ranking the caches. The ratio is a better measure of how much people liked the cache, and thus a better tool for finding the best caches. It still has some weaknesses: most notably, older caches that have a lot of no-longer-active finders will have lower ratios. But I find it more useful than the raw count.

There's a really good point here. I haven't gone back through my finds to award favorites; I'm awarding them as I find new favorite caches. I think many of the caches I would have marked as a favorite 5 years ago have been archived anyway. The problem is that good "older" caches will end up with fewer votes than good "newer" caches, especially by those of us who had pretty much cached out our home area.

It seems that, at least in my area, cachers are going through their old finds to award favorites. I know I did. Still, for someone with a few thousand finds, that would be quite an undertaking. I only have a few hundred and I had to set aside over an hour to do it. How many people with thousands of finds will spend the necessary time to go through all of their finds to award favorites?

So in that respect older caches tend to get gypped to an extent. Also, as I mentioned before, a good percentage of finders of older caches might not be active anymore.

"Famous" caches seem to be getting more favorites than deserved. The oldest cache in NJ, the Gerbil Cache, was the 2nd leading cache in the state as far as favorites go last I looked. It's a nice enough cache but little different than dozens of other caches in the general area, some of which have zero favorites. In fact if I were asked by a visitor to recommend a cache in the vicinity I can think of a half dozen within a mile of that cache which I would recommend over the Gerbil cache.

Of my own caches I was surprised by the ones that garnered favorites that I didn't expect and ones that I thought (judging from the logs) would be favorited a lot, but were not. The criteria cachers use for favoriting is hard to fathom in many instances.

But overall, I'm looking at the caches that are garnering the bulk of the favorites and they are good ones. Any system we could come up has its flaws. I like Markwell's system a lot but it is also flawed. No system is perfect.

Heck, if I'm visiting an area and see a cache that has 5 favorites it is on my radar whether it has 5 finds or 500.

Edited January 21, 2011 by briansnat

22k · January 21, 2011

Heck, if I'm visiting an area and see a cache that has 5 favorites it is on my radar whether it has 5 finds or 500.

Absolutely.

I only have a few hundred finds and I still have plenty of votes left over after logging my Favorites. I think popular will always play a part but I wouldn't mind giving this system more time to see the overall results.

(Seems, madam! nay it is; I know not 'seems.')

bd

Edited January 21, 2011 by BlueDeuce

8k · January 21, 2011

Let's just say I'm disturbed that someone wants to use favorites to compare two cache. This is one of the reasons I always objected to cache rating/ranking system. I liked the favorites idea because it seems useful for trying find caches you are likely going to enjoy (an perhaps to filter out caches you don't care for).

Choosing caches you are likely to enjoy requires comparing caches, by definition. Right? I don't get your concern. Are you worried that people will say that some caches are "better" than others in some absolute sense, and that it will engender some competition for "better" caches?

Choosing cache involves using the available information to decide is this a cache you would want to try or not. I don't believe it requires arranging all the choices from highest to lowest. However, I understand that this is not sufficient for some people. They will select the caches they want to go look for and there will still be more then they have time for; or they will select the caches they want to search for and they see they have time to search for a few more. So some people would like a sliding scale where the cache are ordered the caches from those that they don't want to miss down to those they will avoid at all course. I guess my opinion is that using favorites to get this such a fine level of classification seems unlikely to be successful. I can understand that using a ratio would create something that intuitively gives a ranking - at least for caches that have more than a few finds. Some may find it a useful algorithm to rank the caches that have had more than X find using ratio and then either setting the caches with fewer finds aside (until the get found enough times) or evaluating each of them separately.

I guess I've never worried that I find only the "best" caches I can possibly find or that I might miss a cache that I would have especially enjoyed. So for me something simple that says "this is a cache I'm likely to enjoy" is enough. I can do this with raw counts.

Good luck to those who want to use ratio to sort the caches in such a way that they will achieve their goal. My guess is that they will simply think they have found the "best" caches because they looked for the ones with the highest ratio and will never know what the might have missed. If they do happen to find a cache that they didn't enjoy this way, they find some excuse to rationalize why the method didn't work for this one case. ("There are always exceptions").

I don't believe the number of favorites (whether or not adjusted for number of finds or age of cache) gives a ranking of "bestness" that is useable for anything but entertainment.

And I think you are wrong. I have already used the favorites system to find 5 or 6 really good local caches that I would have ignored otherwise. I used the number of favorites to identify these caches. Thus, I have already proven to myself that the number of favorites is useful for more than entertainment.

Maybe it's not useful for you, but it is useful for me.

I suppose that there is some competitive aspect to what I am doing; if you think of caches as competing for my attention, then the caches with more favorites are "winning." I see that as a feature, not a problem. I cannot understand why you find my method so "disturbing."

You seem to be using the favorites as I would like to see them being used. I remain unconvinced that the ranking of caches using favorites will be valuable in the long run. I think people will likely use the favorites count along with other attributes to select caches and most will be successful in finding enough caches that are enjoyable this way. For those who wanted to avoid certain urban hides they find lame but not eliminate all urban hide, the favorites count will let them know which of the these caches they are more likely to enjoy. They will not have skip all urban micros because most are trache. I suspect that if these people keep looking a other types of caches that they already like they perhaps there will a correlation between the number (or ratio) of favorites and how much the like the cache. But I would even test this. I'd just keep looking for the kinds of caches I like.

As far as competing to please you, I think I'll just keep hiding caches that I like to find. I have no idea how to create a cache that would be sure to get a high ratio anyhow. It seems the luck of draw as to who finds the caches. Though in my case I can predict who will go find my hiking caches and to some degree who will do the puzzles. In fact I suspect that caches I own that have gotten favorites will get pretty high percentages and would get even higher except for the people who aren't going back and favoriting caches they found in the past. But aside from the few people who find my caches, I'm pretty sure they don't appeal to most geocachers.

(I also predict that needle-in-the-haystack hides will have a high ratio of favorites to finds).

5.3k · January 21, 2011

Choosing cache involves using the available information to decide is this a cache you would want to try or not. I don't believe it requires arranging all the choices from highest to lowest.

... long rant about ranking deleted ...

Who said anything about ranking caches? I certainly didn't. I looked through the thread briefly and didn't see anybody else do it, either. Maybe I missed it, but I kind of doubt it.

Some people may choose to judge caches based on some "ranking," but that is the subject for another thread, not this one. This one is about whether ratios or raw counts are more useful.

In my opinion, the fraction of finders listing a cache as one of their favorites is a more useful tool than the raw number of favorites. I find your repeated insistence on ascribing motives to those who want to use the ratio both puzzling and somewhat annoying.

BTW, I will likely be as contemptuous of people trying to compete on the basis of most favorites as I am of people competing on numbers of finds. Sadly, it will probably happen, as there seem to be a lot of people who want to turn everything into a competition.

Me, I just want to find caches that don't suck.

January 21, 2011

Choosing cache involves using the available information to decide is this a cache you would want to try or not. I don't believe it requires arranging all the choices from highest to lowest.

I find your repeated insistence on ascribing motives to those who want to use the ratio both puzzling and somewhat annoying.

I was starting to feel that way too. Also, I'm most concerned that that none of toz's posts contain any analogies about flavors of ice-cream. I might agree with him more if there was more discussion about ice-cream in this thread.

But back on point, I dont see anyone saying that want to use favorite counts or favorite % to determine which is the very "best" cache. (We know that people have and will try and do that). Most people that contributed to this thread aren't planning on doing that. Almost everyone who is requesting a favorite-% is saying that it will be useful to assist them in locating the kinds of caches they might enjoy. I think most people are also agreeing that the % figure used by itself isn't the solution either. That favorite-% would have to be used in conjunction with other information/attributes. In my case I'd have to start by excluding puzzle caches since I'm fairly sure that many of the caches with high favorite-%'s will be seldom-solved puzzles that aren't of any interest.

5.6k · January 21, 2011

I personally think that using favorites to rank caches in hopes of finding the "best" cache is not a valid use of the system.

So you are now telling us what is a legitimate ("valid") use of the favorites feature and what is not?

Wow. Talk about trying to control how other people enjoy the game! Now there are to be un-allowed ways to even think about things.

I don't see anyone telling us what is a legitimate use, he expressed an opinion (see the "I personally" at the beginning of the sentence). It appears to me that while you know math, reading comprehension isn't as well develeoped.

5.6k · January 21, 2011

If my PQ filter results could be ranked by favorites there might be some benefit.

Online, yes. Preview the PQ as a list and then sort on the favourite count column.

Offline, no. But hopefully soon.

That doesn't alway work. I have a PQ based from my home, 500 caches reach out about 10.9 miles. But if I sort on favorites in the preview, it takes the top 500 with favorites, some well over 100 miles away. I can't sort just the caches returned by that PQ. Some of the others I can. :blink:

1.8k · January 21, 2011

Good luck to those who want to use ratio to sort the caches in such a way that they will achieve their goal. My guess is that they will simply think they have found the "best" caches because they looked for the ones with the highest ratio and will never know what the might have missed. If they do happen to find a cache that they didn't enjoy this way, they find some excuse to rationalize why the method didn't work for this one case. ("There are always exceptions").

Earlier it seemed like your criticism was based on the system not being perfect (even though I didn't see anyone claim that ratios would be perfect). But now it seems like you're criticizing people for using ratios while fully acknowledging that the system isn't perfect (excuses and rationalizations, for noting that there are always exceptions).

Surely it's possible to use ratios in a way that isn't wrong or intellectually dishonest?

1.8k · January 21, 2011

"Famous" caches seem to be getting more favorites than deserved. The oldest cache in NJ, the Gerbil Cache, was the 2nd leading cache in the state as far as favorites go last I looked...

The top vote getter in about 1/3 of the states is the first placed cache. The states for which this is not true, it's not off by much - the oldest cache is usually #2 or #3.

Of all of the quirks of the voting I've observed, this is by far the most robust.

2.7k · January 21, 2011

"Famous" caches seem to be getting more favorites than deserved. The oldest cache in NJ, the Gerbil Cache, was the 2nd leading cache in the state as far as favorites go last I looked...

The top vote getter in about 1/3 of the states is the first placed cache. The states for which this is not true, it's not off by much - the oldest cache is usually #2 or #3.

Of all of the quirks of the voting I've observed, this is by far the most robust.

While I have issues with declaring caches are getting more Favorites than "deserved" (how exactly does that happen with a max one vote per individual), I will attribute this phenomenon to simply being they are more easily remembered by the folks going thru their history of finds.

1.4k · January 21, 2011

That doesn't alway work. I have a PQ based from my home, 500 caches reach out about 10.9 miles. But if I sort on favorites in the preview, it takes the top 500 with favorites, some well over 100 miles away. I can't sort just the caches returned by that PQ.

It looks like changing the sort reruns the PQ preview. Sorting by distance the other way starts at your outer distance limit and works inwards.

Easy fix. Just create a PQ with a smaller radius. You don't have to schedule it to run.

20.7k · January 21, 2011

Nobody has given a convincing argument that ratio is more meaningful than a raw count.

Depends on your definition of meaningful. If you want to find more popular, frequently visited caches that are easy to get to then the raw count is quite meaningful.

Here's another set of stats:

A) Old cache with 1000 finds and 100 favorites, 10% ratio

Old cache with 100 finds and 50 favorites, 50% ratio

C) New cache with 20 finds and 8 favorites, and 40% ratio

D) New cache with 5 finds and 3 favorites, 60% ratio

Going by raw numbers the old frequently visited cache comes out ahead. Going by the ratios the other 3 are fairly close and beat out A quite a bit. Personally I'd look at B over D first as the 5 finds is a low sampling size or read the logs of D to figure out why it's liked so much.

It factors out cache age and visit frequency out of the "equation" and thus expands the pool of caches to choose from. If you don't care about that, by all means use the raw count.

Actually, your method doesn't factor out cache age. It in fact penalizes old caches that may have been enjoyed by many more people than the fave stat shows (for reasons already discussed in this thread).

1.4k · January 21, 2011

While I have issues with declaring caches are getting more Favorites than "deserved" (how exactly does that happen with a max one vote per individual), I will attribute this phenomenon to simply being they are more easily remembered by the folks going thru their history of finds.

There's a very easy fix for the popular cache phenomenon and people gaming the system: You don't actually have to go find the top rated caches.

20.7k · January 21, 2011

Let's just say I'm disturbed that someone wants to use favorites to compare two cache. This is one of the reasons I always objected to cache rating/ranking system. I liked the favorites idea because it seems useful for trying find caches you are likely going to enjoy (an perhaps to filter out caches you don't care for).

Choosing caches you are likely to enjoy requires comparing caches, by definition. Right? I don't get your concern. Are you worried that people will say that some caches are "better" than others in some absolute sense, and that it will engender some competition for "better" caches?

I don't believe the number of favorites (whether or not adjusted for number of finds or age of cache) gives a ranking of "bestness" that is useable for anything but entertainment.

And I think you are wrong. I have already used the favorites system to find 5 or 6 really good local caches that I would have ignored otherwise. I used the number of favorites to identify these caches. Thus, I have already proven to myself that the number of favorites is useful for more than entertainment.

You proved that with a sample size of 6? Perhaps Toz isn't the only one without a good grasp of statistics, huh?

20.7k · January 21, 2011

If say I'm taking a trip to Rome I'm going to ask a local what are the best caches to search for when I have a one year old with a stroller, or when I have a rental car and unlimited time, or when I have just a couple hours to walk the area.

What if the local gave you two suggestions but you later realize you have only time for one. The favourite points might be a good tie breaker.

One could argue that it wouldn't matter which cache you looked for. After all, they are both apparently good caches. A moderate difference in favorites (either raw or ratio) could be attributed to many things that have absolutely nothing to do with whether you will like one cache more than the other.

I think that when we try to use faves to find the absolute best of the best caches, we are setting ourselves up for failure. The best we can do is use them to identify caches that are likely not to be bad.

1.4k · January 21, 2011

Actually, your method doesn't factor out cache age. It in fact penalizes old caches that may have been enjoyed by many more people than the fave stat shows (for reasons already discussed in this thread).

While that is a problem it doesn't seem to be a big one. The older caches around here are running in the 10-15% range with a few getting up towards 30%. The newer caches seem to be getting around 5-10%.

Switching the denominator from finds to find that assigned favourites would eliminate the non active finders and finders who don't care about favourites.

1.4k · January 21, 2011

You proved that with a sample size of 6? Perhaps Toz isn't the only one without a good grasp of statistics, huh?

I think it's you who doesn't grasp statistics. His sample size was all the caches in his PQ. The results of his selection criteria was 6.

1.4k · January 21, 2011

One could argue that it wouldn't matter which cache you looked for. After all, they are both apparently good caches. A moderate difference in favorites (either raw or ratio) could be attributed to many things that have absolutely nothing to do with whether you will like one cache more than the other.

If it doesn't matter which cache you looked for, picking the one with the higher score/ratio won't hurt things but might direct you to the better one. I don't see a problem there.

Flipping a coin might be faster however...

I think that when we try to use faves to find the absolute best of the best caches, we are setting ourselves up for failure. The best we can do is use them to identify caches that are likely not to be bad.

Nobody said we're trying to find the best of the best. We're trying to maximize our chances of finding an exceptional cache.

2.7k · January 21, 2011

Actually, your method doesn't factor out cache age. It in fact penalizes old caches that may have been enjoyed by many more people than the fave stat shows (for reasons already discussed in this thread).

While that is a problem it doesn't seem to be a big one. The older caches around here are running in the 10-15% range with a few getting up towards 30%. The newer caches seem to be getting around 5-10%.

Switching the denominator from finds to find that assigned favourites would eliminate the non active finders and finders who don't care about favourites.

If the ratio is delivered in PQ Previews then we are free to filter like crazy. You don't want old caches skewed? Run a PQ with publish date after 1/1/2010 and look at it in PQ Preview view. That will also let you drop those pesky SCUBA caches that game the stats.

If you see a cache with stats that are gamed, you can always toss it on the Ignore list and it won't be sitting at the top when YOU go to look at the list of caches sorted by ratio.

1.3k · January 21, 2011

While I have issues with declaring caches are getting more Favorites than "deserved" (how exactly does that happen with a max one vote per individual), I will attribute this phenomenon to simply being they are more easily remembered by the folks going thru their history of finds.

There's a very easy fix for the popular cache phenomenon and people gaming the system: You don't actually have to go find the top rated caches.

That would also apply to any math applied. No matter what kind of number bending used one of the components is the number of favorites assigned. I have yet to see any example of real caches demonstrated in this forum. Theory is one thing. Piratical application is another.

I am interested in what the cache is you are looking at with 300 finds and 150 favorites.

20.7k · January 21, 2011

Actually, your method doesn't factor out cache age. It in fact penalizes old caches that may have been enjoyed by many more people than the fave stat shows (for reasons already discussed in this thread).

While that is a problem it doesn't seem to be a big one. The older caches around here are running in the 10-15% range with a few getting up towards 30%. The newer caches seem to be getting around 5-10%.

Switching the denominator from finds to find that assigned favourites would eliminate the non active finders and finders who don't care about favourites.

That still wouldn't resolve the problem, in fact it might make it worse. As exampled in this thread, there are cachers who don't assign faves retroactively, but are assigning them to caches going forward.

7k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

2.7k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

Fully agreed. I don't see what the problem is with providing this information to those who want it.

3.1k · January 21, 2011

I guess I've never worried that I find only the "best" caches I can possibly find or that I might miss a cache that I would have especially enjoyed. So for me something simple that says "this is a cache I'm likely to enjoy" is enough. I can do this with raw counts.

Then you can use raw counts. Other people have other goals.

By using favorite ratios, D/T ratings, categories, attributes, etc., I could do a course-grained sort on the likelihood that I'll enjoy various caches. If I'm going to spend a week in a foreign country, then I'd like some idea of which caches "I'm most likely to enjoy," not just those that pass some threshold of "I'm likely to enjoy." Raw counts can help give me a rough idea. Ratios can help give me a better idea.

3.1k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

Fully agreed. I don't see what the problem is with providing this information to those who want it.

Somewhat agreed. I think both raw points and percentages should be displayed, at least on the individual cache pages. But I doubt Groundspeak will give us two sort options: raw points and percentages. If only one option is allowed, then I'm hoping (and advocating) that the sort be on percentages.

Edited January 21, 2011 by CanadianRockies

1.3k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

Fully agreed. I don't see what the problem is with providing this information to those who want it.

I don't agree. I am still waiting to get the information want.

1.4k · January 21, 2011

That still wouldn't resolve the problem, in fact it might make it worse. As exampled in this thread, there are cachers who don't assign faves retroactively, but are assigning them to caches going forward.

Do what penguin suggested, run two PQs with dates before and after favs were implemented. If there's that much of a difference than a single PQ (and you care about the difference) then just keep using the dual PQs to select caches.

2.7k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

Fully agreed. I don't see what the problem is with providing this information to those who want it.

I don't agree. I am still waiting to get the information want.

Um. Last time I checked the information you want (specific real world cache examples) is an extension of the argument as to whether or not the information class of "ratio" should be available. That puts you in the camp of arguing about something that could be useful to others, but makes no difference to you if you won't use it.

Or am I missing something?

1.4k · January 21, 2011

I am interested in what the cache is you are looking at with 300 finds and 150 favorites.

You do know those numbers were made up, right? Math text books do it all the time to demonstrate things.

I gave you a link to the tool that will display the percentages. Or you can use a spreadsheet and calculate it manually.

I've never claimed the results will work for your personal cache preferences.

1.3k · January 21, 2011

Personally I'm amazed that this thread is still going. Why do people always argue against things that could be useful to others, but make no difference to themselves if they don't use it?

Fully agreed. I don't see what the problem is with providing this information to those who want it.

I don't agree. I am still waiting to get the information want.

Um. Last time I checked the information you want (specific real world cache examples) is an extension of the argument as to whether or not the information class of "ratio" should be available. That puts you in the camp of arguing about something that could be useful to others, but makes no difference to you if you won't use it.

Or am I missing something?

Yes you are missing something. I will clarify so there is no confusion.

1) I have seen lots of examples sing made up numbers, see post #148. I would like someone to provide an example using 20 real caches and see if the math holds up.

2) What cache has 300 finds and 150 favorites?

Edited January 21, 2011 by Keith Watson

2.7k · January 21, 2011

Um. Last time I checked the information you want (specific real world cache examples) is an extension of the argument as to whether or not the information class of "ratio" should be available. That puts you in the camp of arguing about something that could be useful to others, but makes no difference to you if you won't use it.

Or am I missing something?

Yes you are missing something. I will clarify so there is no confusion.

1) I have seen lots of examples sing made up numbers, see post #148. I would like someone to provide an example using 20 real caches and see if the math holds up.

2) What cache as 300 finds and 150 favorites.

What exactly am I missing there?

Ratio vs. Raw Count

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment