# Geocaching Growth

In a probably futile effort to redeem myself in the forums, I present an analysis of growth for geocaching.com. It's pretty interesting. I am a physicist, so I will tend to talk in terms of scaling trends; there may well be other ways to interpret these data. That's what this topic is for. What do you think?

First: how has the number of caches grown over time? I don't know the number of active caches as a function of time, but I do have the total number of caches submitted. Here's a graph:

Normally one would expect a growth curve to be exponential, but this curve does not fit an exponential. Instead, it fits very well to a power function with an exponent very close to 2. Here's a plot of the square root of caches versus time:

So what could give rise to such a curve?

Well, from scaling arguments, you get this curve if the number of caches submitted each week (or whatever time period you choose) is increasing linearly with time. Which would imply, if you assume that active cachers submit caches at a constant rate, that the number of active cachers ios increasing linearly with time.

continued in next message...

Where do you factor in the affect of increased number of approvers and the rate at which any given set of approvers can process the cache request?

So if the number of cachers is increasing linearly, and the total number of caches is increasing as the square of time, then I would expect the number of cache logs to be proportional to the product, or go as the cube of time.

So here's the graph of total logs submitted versus time:

Sure enough, it fits a cubic pretty well. Here's a plot of the cube root of log number versus time:

Pretty linear, huh? Amazingly close to the prediction. So the number of active caches times the number of active cachers goes as the cube of time.

Next I'll look at the total number of registered cachers on the site...

The total number of registered users is shown here:

The exponent here is about 1.85; pretty close to 2. So the obligatory square-root plot:

Shows that the number of users is growing slightly more slowly than the square of time.

So what do these data imply? Tentative conclusions:

1. Caches tend to stick around for a long time.

2. Users do not tend to stick around; they place some caches, find some caches, and leave.

3. Geocaching is not growing exponentially, as you might expect. Instead, it is undergoing more or less linear growth.

By the way, the slight decrease from the trend at the end of the number of caches submitted (first post) looks to me to be a seasonal trend, not an actual long-term effect. You can see the dropoff in the winter of every year.

Is this going to be on the test?

Very interesting, but while interesting what does possessing this knowledge gain me. By that I mean, what can I do with this info or is it information for informations sake. Which is a good thing in and of its self. I'm just a little confused.

No disrespect intended at all.

Very interesting, but while interesting what does possessing this knowledge gain me. By that I mean, what can I do with this info or is it information for informations sake.

Well, it is a good way to answer people when they ask how quickly geocaching is growing, or if it is dying. I've seen threads about those topics on a fairly regular basis, accompanied by much speculation and anecdotal evidence. This data indicates that it is not dying, but neither is it spreading like wildfire.

If I were Groundspeak, I would use it to evaluate how well the business is going and what could be done to improve the site. But I'm not...

I'm not one to understand things like that.

What I got from the data is that instead of growing out of control and perhaps flaming out I see the growth as controlled (by no one) and steady. Something that ensures that the sport will be around for a long time.

I took probability and statistics twice to pass once. And I really think the instructor just took pity on me the second time. Anyway I would have expected to see an exponential growth. Why is this not true? An attrition rate that is exponential?

What I got from the data is that instead of growing out of control and perhaps flaming out I see the growth as controlled (by no one) and steady. Something that ensures that the sport will be around for a long time.

I agree completely with this. Another thing I thought of: this analysis shows that the world is not likely to be completely flooded by caches anytime soon. As you say, the growth is moderate, not unchecked, and thus fears about global geocache saturation are overstated.

using whatever resources you have can you plot saturation vs population density? I imagine it should track that the more population in the area the more caches but are there areas where it doesn't follow? More people less caches or less people more caches?

I can't pass this up. "Exponential Growth" is always funny to hear. This online article references the president for a company I used to work for.

Scott Swerland of “SAVI, the men’s apparel and accessories retailer,” said, “We’ve grown from eight employees to about 50 employees … We are experiencing exponential growth,” according to the Eastside [of Puget Sound] Business Journal. If that exponential growth (approximately 81.88) continues, in 3 years, everyone on Earth (and a bunch of other planets) will work for SAVI, because the company will employ more than 194 billion people.
...what can I do with this info ...

You can combine it with an economic study of the current economic boost geocaching provides to predict how much the impact will grow over time and use it to promote geocaching.

Pretty good stuff actually.

Looks to me like Jeremy is doing a good job with the website and the occasional geocacher suicide isn't putting a dent in the growth curve. A growth curve like that is every company's dream.

...what can I do with this info ...

You can combine it with an economic study of the current economic boost geocaching provides to predict how much the impact will grow over time and use it to promote geocaching.

Pretty good stuff actually.

You can also use this information to make a cache puzzle

-fractal

One of the more simple analysis done is to see that there are 180,000 user accounts on GC.com but in the past 7 days only 5500 have written a log (21,000 logs to be more accurate). Even being kind and assuming that those 5500 are only a tenth of those currently caching, that's 55,000 active cachers and 180,000 users listed. Not knowing how many of those 180,000 are duplicates, we can guess at less than 50% of them being duplicates. That means really roughly 100,000 individuals signed up at the site at some point and with our generous 55,000 that are active, it means that on the order of 50,000 people have left this site for one reason or another.

You may disagree with those percentages and guesses, but it would take a bit of shift in them to make much of a difference in the fact that thousands of people don't come back to GC.com after registration. This also gives rise to the question of how many come here but don't ever register and don't come back.

Also, looking at the fact that 5500 accounts have generated 21000 logs in the past 7 days means that those accounts are averaging almost 4 logs per account. Unfortunately it's not very clear how many caches that this represents, but since really new cachers have every cache to be able to find but veterans have only the latest tip of the cache curve available in their area..it all averages out to make the cube curve of logs work out correctly.

As Jeremy correctly alludes to in his post, this curve is not sustainable. Unlike NavDog's elation, if we were in a market situation, this kind of curve would tell me to get out soon while the getting is good. The growth phase may be soon over and then it's all stable from there. Look at the December returns on user number. It went from 30,000 to 90,000 (3x expansion) in 2002, but only 90,000 to 180,000 (2x expansion) in 2003. But attrition rates are proportional not only to the number of users joining (aka bandwagoners dropping off after very little time), but also a lesser proportion to stable users (aka veterans who stop for whatever reason) is added to that. Fizzy doesn't have access to those sorts of numbers (dead accts, etc) to estimate the attrition rate, but even a small rate makes the user curve look even more like stability is soon reached in the user base instead of any sort of expansion.

Exponential growth curves would actually be better trends because attrition rates are less effectual on the bottom line. We are stable and upward currently. But we are close to an inflection in the data is my interpretation.

That's a very interesting analysis, fizzymagic.

One of your tentative conclusions was:

2. Users do not tend to stick around; they place some caches, find some caches, and leave.

There may be some evidence of that. I pulled up my Profile Page which is at http://www.geocaching.com/profile/default.aspx?A=4285. That implies that I was the 4285th person to register. If I look at the 4280th through the 4289th profile pages I found only three others still active. So that's four out of ten active.

Now your mileage may vary; and I suspect that if your registration was more recent those who registered just before and just after you are more likely to still be active. I suspect that sort of attrition rate is normal for hobbies, sports, and interests in general, but it's gratifying to see that geocaching hasn't peaked yet.

~erik~

~erik~

Thats very interesting but you have based you belief of how many active cachers there are on some very shaky assumptions. Sure looks good. Too bad there is no basis for fact in your assumption. Why assume that the posters are 1 tenth, perhaps they are 1-100th maybe they are 1-50th you and I do not know. therefore your information is a flawed as its assumptions

Wonder if Fizzy will offer a course in this kind of stuff. you might want to look into it.

Is this going to be on the test?

Very interesting, but while interesting what does possessing this knowledge gain me. By that I mean, what can I do with this info or is it information for informations sake. Which is a good thing in and of its self. I'm just a little confused.

No disrespect intended at all.

I think what this all means is that you are going to gradually become a very busy pony....sadle up.

What Fizzy came up with looks very interesting, but ju66l3r has what I see as a fatal flaw in his logic. Very few cachers hunt a cache every single week. Even look at some of the top cachers in the world, like BruceS, and you will see he often goes weeks or more without a find. Then factor in the bad weather we've had in many parts of the country the last few weeks, and the winter slowdown in general, and any attempt to relate all of geocaching to just the people who logged a find in the last week is flawed beyond use.

As Jeremy correctly alludes to in his post, this curve is not sustainable.

Umm, no. I was alluding to the fact that nothing really grows exponentially.

fizzymagic's assumption from the data is that the growth is linear, not a curve, and that it has been sustainable over time. Maybe I read the post wrong.

I can't pass this up. "Exponential Growth" is always funny to hear.

Everytime I hear "Exponential Growth" I think of this:

Audio Transcription:

This is the most important slide. I am going to give you a lot of data. The data is not going to be as in depth as it might be for a scientific audience. And even then, it is more than most get about data. So, I wanted to start off with this slide here and really, if you remember one slide, this is the slide.

When Elvis died, there were an estimated 37 Elvis impersonators. In 1993, when this report was done—and it is not a CDC report—it estimated that there were 48,000 Elvis impersonators worldwide. And that was an exponential growth curve, projected out, would say there would be 2.5 billion Elvis impersonators present by 2010. Most people do not think that is very likely, and I know my wife does not use it in her guidance counseling regarding career paths.

At 2010, roughly there will be 7.5 billion people in the world, so two people in the audience for every Elvis impersonator. It would have to be hefty ticket price to really provide a good living for each Elvis impersonator. As I said, this is the most important slide and I will come back to it later in the talk. And just so you do not think it is so farfetched, this year, 2002, is the 25th anniversary of Elvis' death. And Time magazine did a survey, and it is estimated that about 28 million adults in the United States have impersonated Elvis at some point in their lives. That is just the United States. We do not know the global. But I think that means we are still on the trajectory.

As Jeremy correctly alludes to in his post, this curve is not sustainable.

Umm, no. I was alluding to the fact that nothing really grows exponentially.

fizzymagic's assumption from the data is that the growth is linear, not a curve, and that it has been sustainable over time. Maybe I read the post wrong.

Well, that's not a fact. Given enough raw material, many things grow exponentially. The problem is that of consumption. So what you were properly alluding (whether you realized it or not) is nothing growing exponentially can reasonably be expected to sustain that growth permanently. There are cell cultures in my bio lab that currently grow exponentially. If I don't feed them or give them proper amounts of room, they will stop that stage of growth.

The userbase here is not growing in such a manner. Fizzy's growth curves are not indicative of the actual userbase, only the registration of accounts (and not even the verification of those registrations...that is, users who began registration procedure and then stopped before validation of their e-mail still get a number).

Not that the numbers are indicative of this at all, but anyone could begin signing up user000001 to user999999 and you'd see a spike in fizzy's number in the middle of 12/2003.

As to CO's post, we can safely assume that the 5500 active in the past week are not indicative of 1/50 or 1/100 of the total number of active users. If it were even 1/50th, that would mean there are more active users than the 180000 total accounts available (in other words, 5500 is 1/50th of 275,000 which is > 180,000). At most, 180,000 accounts available....each one active (which we know not to be the case, but let's just assume that we're at maximum to get an upper limit) then the 5500 in the past week can only be indicative of at most apx. 1/32 of the total number of *real* active cachers on the site. Assuming that 180,000 is not the true number of active cachers at this point in time and that some of them are duplicate accounts, sock puppets/trolls, and non-active cachers....I think 55,000 active cachers at this point in time is a fairly decent rough estimate. Even if it is off by a factor of 2 (so there are 110,000 active and 70,000 dead accounts). I was being far overly generous with the number of duplicate accounts in my first post. The point is that you would have to assume that over 80% of the active cachers in the world didn't log a thing in the past week if you are to assume many more than 100,000 people have active cache accounts. If these assumptions are wrong, it would be easy enough for someone who has access to the DB lookup that spawns the weekly update to do a run on the past month or 2 months and see how many different accounts have logged something in that larger time frame. But given that only 4 of the accounts near eric (2001 startup) and 6 of the accounts near me (2002 startup) in that small sampling are still active, I'm going on a guess that about 20-50% of the account total is inactive which is about 40,000 accounts in its most conservative approximation. That means my statement ("thousands of people don't come back to GC.com after registration") is still true.

Thanks for the offer of fizzy's course, CO....maybe I could do the same for you and simply offer you a calculator first.

Simplier put to Mopar, people like Bruce who have to go further or wait longer to find new caches are balanced by the many new people who have joined (the line *is* still linear at this point) and they have caches on their doorsteps. I agree that this is part of that time when fizzy's lines show sag every year, it makes sense, but I also don't think that a week in June is going to be only 1/10th of the active total at that point either. I took what you said into consideration when I used 1/10th instead of 1/2th or 1/5th. It's clear that because we're dealing with such large numbers, any anomalies (such as Bruce going for long periods without a cache) are easily compensated by anomalies in the other direction (a gang of newbies all joining together and getting 15 in a day). We're dealing with such a large population that all of these things can be assumed to follow a normal distribution. The question only remains where the mean of that distribution will lie. Nothing about my guesses suggests that I am very far from those means at all and when I am off of it, I would have to be off by a mile for it to matter in the final outcome because of how the standard deviation of a normal distribution is calculated.

Fizzy's not the only statistician here. I've talked with fizzy about some of these numbers. It will be interesting to see what the next 6 months to 2 years brings in these trends.

I *could* be wrong...I am only estimating....but I have seen data trends like this before and I know their outcome.

Every time I hear "exponential growth" I think of the reproductive capabilities of the neighborhood cats. I'm sure glad I took my "C" in statistics and pursued electronics. What I learned from statistical analysis is that liars use statistics to unite fools in support of their special interests. Present company excepted.

fizzymagic's assumption from the data is that the growth is linear, not a curve, and that it has been sustainable over time. Maybe I read the post wrong.

You read it right. I would be very hesitant to make any predictions about future behavior, but I see no evidence of any deviation from the pattern of the last couple of years.

I would have used the word "inference" instead of "assumption," but hey, who's gonna be picky?

I can safely assume that ju66l3r is making a lot of assumptions and estimates.

I have a sort of related question. Ones profile page shows a last visit date, and awhile ago I was looking at a particular member who, as far as i could tell was inactive, but his last visit date was always up-to-date when i looked at it. I suspected that whenever GS sent him an email (he owns caches) it would update the visit date - nothing else made sense to me. If this date/time *is* accurate, I'll bet an interesting set of plots could be made from that data.

For an interesting article describing quadratic growth (what I see in the growth rate of total users and total caches) see this.

Interestingly, quadratic growth occurs in 2-dimensional system in which locality is taken into account; in this case, the majority of those infected are next to others already infected, so they cannot spread the infection further.

Not, mind you, that I am saying that geocaching is an infection.

My initial response to this is that it indicates that the growth of geocaching comes from some 2-dimensional phenomenon. I speculate (and that's all it is) that such a model would come from the growth of cache-filled areas on the surface of the Earth. In that case, geocaching growth is more a result of the density of caches than the number of cachers.

Intriguing, but by no means solid.

I have a sort of related question. Ones profile page shows a last visit date, and awhile ago I was looking at a particular member who, as far as i could tell was inactive, but his last visit date was always up-to-date when i looked at it. I suspected that whenever GS sent him an email (he owns caches) it would update the visit date - nothing else made sense to me. If this date/time *is* accurate, I'll bet an interesting set of plots could be made from that data.

No, the user is visiting the site.

Just sending an email does not update that date.

I'm just a little confused.

No doubt. He's showing the growth of geocaching in various terms, i.e. cachers, caches, etc.

Also, no disrespect intended. ;-)

RobertM

Geo 39E-MNX, 40E-JJJ

I'm just a little confused.

I think what this all means is that you are going to gradually become a very busy pony....sadle up.

That has to be the funniest post of the month!

You can also use this information to make a cache puzzle

-fractal

Don't give Fizzymagic any ideas. His puzzle (and most of his non-puzzle) caches are hard enough already!

I have a sort of related question.  Ones profile page shows a last visit date, and awhile ago I was looking at a particular member who, as far as i could tell was inactive, but his last visit date was always up-to-date when i looked at it.  I suspected that whenever GS sent him an email (he owns caches) it would update the visit date - nothing else made sense to me.  If this date/time *is* accurate, I'll bet an interesting set of plots could be made from that data.

No, the user is visiting the site.

Just sending an email does not update that date.

I see what he's getting at. Here I am Mr Deadbeat Cacher (hypothetically) and I've left all or most of my caches for dead. I've moved to the center of the Earth or Mars where there are no caches... well... yet. Anyway, if someone hits my cache or moves my TB then I get an e-mail and while I don't geocache anymore and have no intentions to I still follow the links provided in the e-mail to read the log on the cache page... because reading the e-mail version isn't as pretty. There's an example of an inactive cacher who's last activity date could be as recent as 20 seconds ago... ehh?

Ok, so this all means that there's more caches and cachers this month than there were last month? And there'll probably be even more next month? Gotcha. And as far as Elvis Impersonators go, One down, 2,499,999,999 to go.......

I see what he's getting at. Here I am Mr Deadbeat Cacher (hypothetically) and I've left all or most of my caches for dead. I've moved to the center of the Earth or Mars where there are no caches... well... yet. Anyway, if someone hits my cache or moves my TB then I get an e-mail and while I don't geocache anymore and have no intentions to I still follow the links provided in the e-mail to read the log on the cache page... because reading the e-mail version isn't as pretty. There's an example of an inactive cacher who's last activity date could be as recent as 20 seconds ago... ehh?

Chances are though that Mr. Deadbeat cacher isn't bothering to log into the site first, just following the links. If he's not logged in it doesn't count. I don't know how long the login cookies could potentially stick around, though.

If there was ever any doubt as to the number of Geeks in the Geocaching community this Thread should prove it.

We're here, we're Geeks..get used to it!

All this number crunching is making me thirsty better get a coffee.

I would have used the word "inference" instead of "assumption," but hey, who's gonna be picky?

Zing!

Perhaps if the forum folks were more interested in their find counts than their post counts the numbers for the last week would be better. I would be interested in seeing just what percentage of cachers overall post here in the "Voice of Geocaching". At the first meeting of a new club last week I was surprised to see that less than five percent of the turnout looked at the forums on anything resembling a regular basis, and one of those was a moderator. I think that geocaching is doing just fine, but the resultant conversation is what is in a downward trend.

For all you statisticians and would-be statisticians I have some fodder:

Look at the Dutch statistics. The numbers are not that high but: can comparable conclusions be made from that data or are - in some statistically way - the Dutch funny guys and behaving different??

http://www.geocaching.nl/stats/stat.php

We don't create and submit caches at a constant rate.

We have bursts of activity with some new geocachers getting really busy placing caches then stopping for a while. Same with finding them. Find a bunch, then get busy with something else and stop for a while.

Averaged out over a large enough group this may seem linear or close to it.

The post about the number of active cachers versus registered cachers is very interesting. We have a number of cachers in the area who seem to drop out for good or for a long time then come back for brief periods of activity, then drop out again. I wonder how widespread this is?

Normally one would expect a growth curve to be exponential, but this curve does not fit an exponential.  Instead, it fits very well to a power function with an exponent very close to 2.  Here's a plot of the square root of caches versus time:

So what could give rise to such a curve?

Well, from scaling arguments, you get this curve if the number of caches submitted each week (or whatever time period you choose) is increasing linearly with time.  Which would imply, if you assume that active cachers submit caches at a constant rate, that the number of active cachers ios increasing linearly with time.

continued in next message...

We don't create and submit caches at a constant rate.

Not true. GeoSnuggler has created, placed and had approved 1 cache per month since November of 2001. His latest SnuggleUpiCus marks the 25th consecutive cache for him. If this isn't "creating and submiting caches at a constant rate" then what is?

YAH!

using whatever resources you have can you plot saturation vs population density? I imagine it should track that the more population in the area the more caches but are there areas where it doesn't follow? More people less caches or less people more caches?

Well, there are a lot of people in Watts and Compton, but while I was down there this last weekend, I didn't see but a couple of caches in the area. Here is a map of L.A. The big empty area is Compton and Watts. I found it interesting. Draw your own conclusions.

--Marky

using whatever resources you have can you plot saturation vs population density? I imagine it should track that the more population in the area the more caches but are there areas where it doesn't follow? More people less caches or less people more caches?

Well, there are a lot of people in Watts and Compton, but while I was down there this last weekend, I didn't see but a couple of caches in the area. Here is a map of L.A. The big empty area is Compton and Watts. I found it interesting. Draw your own conclusions.

--Marky

Um, notoriously dangerous areas?

using whatever resources you have can you plot saturation vs population density? I imagine it should track that the more population in the area the more caches but are there areas where it doesn't follow? More people less caches or less people more caches?

Well, there are a lot of people in Watts and Compton, but while I was down there this last weekend, I didn't see but a couple of caches in the area. Here is a map of L.A. The big empty area is Compton and Watts. I found it interesting. Draw your own conclusions.

--Marky

Those 2 areas are where GC.com is testing out their new "Has to pass the coffee table book" interest level on traditional caches

We are just about to get our newest cache approved. It is located right in the middle of this area. It will be called " Stay low and zig zag".

We are just about to get our newest cache approved. It is located right in the middle of this area. It will be called " Stay low and zig zag".

I love the title. Hilarious.

It's time to update this thread through November 2004.

It remains clear that the growth of geocaching is not exponential.

Here is the number of caches placed:

It still follows the quadratic growth I saw originally:

And the growth in users does the same:

But my characterization of the growth of logs as cubic seems a little wrong now:

