Jump to content

The Servers Overloaded!?


hedberg

Recommended Posts

We have had some recent problems with people hammering the site with automated bots, which causes performance hits. We treat them like denial of service attacks and kick them out when we find them.

I would actually go a little farther than that, if I were saying that...

 

"We treat them as denial of service attacks and kick them out when we find them."

Link to comment
We have had some recent problems with people hammering the site with automated bots

Ok. Maybe someone can explain this to me. Why would anyone want to "Hammer the site with automated bots"?? Is this some sort of data collection deal, or did we really tick off someone and they are trying to take the servers down in revenge??

 

Baptist Deacon :P

Link to comment
We have had some recent problems with people hammering the site with automated bots, which causes performance hits. We treat them like denial of service attacks and kick them out when we find them.

I would actually go a little farther than that, if I were saying that...

 

"We treat them as denial of service attacks and kick them out when we find them."

If that's the cause of the problems tonite, I would side with ClayJar. Ban the IP and report them to their ISP, AND post their user account here so we know who to blame for not being able to use the website.

Link to comment
We have had some recent problems with people hammering the site with automated bots, which causes performance hits. We treat them like denial of service attacks and kick them out when we find them.

Jeremy, this sounds like very unfortunate !! You will most propably need cooperation with your Network Connection Provider !! Or what else can you do ?? Unfortunately this is the present state, what will we face in the near future ???

Link to comment

I assume that some of these robots are providing different non-GC sites that have statistics about cachers/caches in their area. Couldn't GC instead have stats about different countries/states on their site, or has this been discussed 12737 billion times before at this forum??

 

Just a question.

Link to comment
I assume that some of these robots are providing different non-GC sites that have statistics about cachers/caches in their area. Couldn't GC instead have stats about different countries/states on their site, or has this been discussed 12737 billion times before at this forum??

 

Just a question.

Hi again Hedberg, I am affraid these people have neccessarily nothing to do with GC people, they just want to cause any possible harm to anybody they can find !! And there is nothing much you can do if they hit you real, only upgrade your servers and hope the best !!

Link to comment
We have had some recent problems with people hammering the site with automated bots, which causes performance hits. We treat them like denial of service attacks and kick them out when we find them.

I think my IP is banned two times since saturday, I join to the site from another ISP.

 

I don't understand at all about the bots you are talking about. Did you mean spyware bots?

 

I downloaded Spybot and Ad-aware both and upgraded them, i found some spyware, specially a lot of tracjing cookies of unknow domains.

 

Can Spybot and Ad-aware help the problem?

 

How long can take your Bans?

Link to comment

While I could be well off base here, but the data scapers probably fall into three categories; grabbing pages because they don't get PQs, grabbing pages because PQs don't have all of the logs, stats.

 

Can't do much about the first one, but I'll be upfront and tell you why I will do the second one before a long trip. I can't tell you how many times we've gotten to a cache, had trouble, and when consulting the logs it said the coords or description was way off and they had referred to a previous log. But, now that log is not on the PQ because of the 5 log limit. We're wasted 2 hours for nothing. Getting only 5 logs is like getting only part of the description.

 

So now before a 200 mile trip, not only do I download a PQ of the area, I massage the PQ to get a list of the caches and run that through an offline browser limiting it to 2 connections and 1 request per second. I try to run it in the morning during the week when I suspect the server load is at its least. What I do is not like the some other scrapers. Mine is limited to only the caches in a certain area and is finite. I've only done it twice and I can tell you that it has worked out very well--which shows that the 5 log limit is just too low.

 

I suspect this type of scraping is in the minority, though. It could be solved by changing the 5 log limit to something more reasonable. I'd like to see, in the least, to have at least the last 10 logs and, though included, log types other than "found it" logs not be counted torwards the 10. That would give us a much better selection of the found it logs without lossing the ever important DNFs and notes.

 

The site scraping for stats only illustrates the desire for stats. Displaying stats directly here on the site or better yet providing some kind of "feed" for the stats for each cacher would go a long way to reducing the site scraping because of stats.

 

In short, any data scraping that is going on tells you you're not providing all the services that people want.

 

Trying to block the true data scraper is like the "War on Drugs," you can't stop it and you'll waste far too many resources trying to.

Link to comment
The site scraping for stats only illustrates the desire for stats. Displaying stats directly here on the site or better yet providing some kind of "feed" for the stats for each cacher would go a long way to reducing the site scraping because of stats.

I agree 100% with this. I don't feel any sympathy in the least if this is what the cause of the server load is. There are LOTS and LOTS of people everyday saying that they want this. TPTB of this site say that they don't want to make it a competition. Fine, don't. But like CoyoteRed said, make a "feed" where other sites can easily get the data they need. Let the other sites set up the statboards for their local area. People that don't want to compete don't ever have to look at those sites.

 

Maybe its me, but I just don't see what the problem is with this. Why are you so stubborn on this subject?

 

--RuffRidr

Link to comment
The site scraping for stats only illustrates the desire for stats.  Displaying stats directly here on the site or better yet providing some kind of "feed" for the stats for each cacher would go a long way to reducing the site scraping because of stats.

I agree 100% with this. I don't feel any sympathy in the least if this is what the cause of the server load is. There are LOTS and LOTS of people everyday saying that they want this. TPTB of this site say that they don't want to make it a competition. Fine, don't. But like CoyoteRed said, make a "feed" where other sites can easily get the data they need. Let the other sites set up the statboards for their local area. People that don't want to compete don't ever have to look at those sites.

 

Maybe its me, but I just don't see what the problem is with this. Why are you so stubborn on this subject?

 

--RuffRidr

Now thats a great attitude. If a company doresn't desire to provide you with a product, it's ok to steal it from them. No matter that it's costing the company (and indirectly, legitimate users) money and hampering the ability for them to provide the products they do offer.

Nice.

Link to comment
... it's ok to steal it from them.

Oh, come on, Mopar. You know that's not right. Datascrapers aren't getting data that's not freely available to everyone. They're just getting it in a way that taxes the resources.

 

It's not stealing. It more like hogging. They are not taking "product off the shelves." It more like a library where someone comes in a grabs huge armfulls of books and goes and sits in the corner only reads a few passages because that's all they need. They go and get all of those books because they might information out those books and it's easier to do that than go get one book at a time. That's a better parallel. It's hogging resources.

 

However, if the library was to compile certain information it would be easier on the searcher and the library!

 

Speaking of stats, it wouldn't take much at all to create a link on one's profile page that spit out a comma delimited list of the found caches with the pertinent information.

 

Or a single canned query twice daily with the stats of all cachers that have been active in the last week.

 

Or get the ability for one to get a PQ of all of their finds online quickly so they can include themselves on a stats site.

 

There are many ways to provide a service while making it easier on everyone.

Link to comment
Now thats a great attitude. If a company doresn't desire to provide you with a product, it's ok to steal it from them. No matter that it's costing the company (and indirectly, legitimate users) money and hampering the ability for them to provide the products they do offer.

Nice.

That's not what I am saying. I personnally don't advocate scraping of GC.com's website.

 

--RuffRidr

Link to comment
... it's ok to steal it from them.

Oh, come on, Mopar.

Didn't you commit forum suicide?

 

This isn't a place to argue semantics of stupid actions. Whether it is hogging or stealing in your humble opinion, it's considered a denial of service attack and is not tolerated on this site.

Link to comment

So just to clarify.

 

Geocaching.com will ban IP addresses if they are found to be running bots.

 

No IP addresses were banned through the weekend.

 

That would imply that:

 

No bots were running over the weekend that caused the slowness (so the slowness was not caused by bots)

 

Or

 

That no bots were caught over the weekend (so more work needs to be done on the detection side but if no bots were caught who knows if there really was anything to catch)

 

Or

 

The site is just slow

 

Because this is a weekend problem and bots have no concept of time other than what sort of things their owners may have set up and this does not seem to happen at other times my guess is that it is not bots. It may very well be the number of people trying to log their entries from a weekend of caching. Does anybody have a clue how many logs were created over the weekend and it would seem more specifically on Sunday evening? If it was thousands then maybe there is no cause for alarm, that is just the way things go. But if it was hundreds then there would seem to be problems.

Link to comment

I'm not sure if this is on-topic since the thread just took a turn, but I just got this when I tried to pull up a local cache page:

 

Cache Name

Owner

 

 

 

 

Transaction (Process ID 145) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

N/S 0 000.000 W/E 000 00.000N/AHyperlink to Jeeep

 

 

Location

Hidden: Date

Use waypoint: GCXXX (what's this?)

 

 

 

(ratings out of 5 stars. 1 is easiest, 5 is hardest)

Difficulty: N/A Terrain: N/A

 

(Short Description)

 

 

(Long Description)

 

Additional Hints (Encrypt/Decrypt)

 

(Hints)

 

 

 

 

 

 

 

 

 

Logged Visits (No Logs)

Warning. Spoilers may be included in the descriptions or links.

Cache find counts are based on the last time the page generated.

 

Refreshing the page gave me the correct info.

Link to comment
This isn't a place to argue semantics of stupid actions. Whether it is hogging or stealing in your humble opinion, it's considered a denial of service attack and is not tolerated on this site.

I am confused about a couple of things...

 

At what point is it considered a denial of service attack?? Is it one bot slowing things down or multiple bots? If so, which bot is the one considered guilty?

 

If I am not mistaken, many of the state geocaching associations scrape data for their local stats. Are they being banned too?

Link to comment

 

If I am not mistaken, many of the state geocaching associations scrape data for their local stats. Are they being banned too?

You'll find that the many of the state and non US sites listing stats, are approved by Groundspeak. And collect their data in a approved maner and at approved times, using a log in to the site, and so can be tracked very easily. In the case of the UK stats site, they have very stict rules on what, when and how they collect data, so as not to affect the running of the site. Dave

Link to comment

Here we go again!! Been trying to log finds for a while now and keep getting this:

Server Error in '/' Application.

--------------------------------------------------------------------------------

 

Server Too Busy

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

 

Exception Details: System.Web.HttpException: Server Too Busy

 

Source Error:

 

An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.

 

Stack Trace:

 

[HttpException (0x80004005): Server Too Busy]

System.Web.HttpRuntime.RejectRequestInternal(HttpWorkerRequest wr) +147

Link to comment

I have ben getting a number of different but similar errors like yours listed while trying to do a number of different things on GC.com tonight. It is the first time I have had andy real issues with this site. It is also running very+++ slow at times, I think that there are just alot of people out there that don't have real lives outside of their 'puters. :D

 

Hey wait that doesn't include me does it :D ....I mean I did recognize the problem. So either I am better than everyone else or I have just started step one of a 12 step program :D:D ...Hi I'm MedicP1 and I am addicted to techno toys and the smell of plastic in the woods! :D

Link to comment

Jeremy, would a new hosting machine help, or is it just a matter of trying to find a way to keep particular users/apps from hogging CPU cycles? If a new hosting machine is in order, I'd be willing to contribute a fair amount towards one of these bad boys.

 

Although I gather you're running the site on a Windows variant, so that might not appeal to you very much. :D

Link to comment

Flawed reasoning. Bots run all week, but during the weekends the combination of bots and regular users creates a greater "perfect storm" candidate. Also people running bot clients for downloading bulk pages occur when they want them, which is normally on the weekend.

Link to comment

I know its redundant to say this, as everyone is having the same problems, but it makes me feel better to complain, heh heh.

 

I have been very irritated this morning spending 30 minutes trying to print out a few cache pages that should have taken 5 minutes, without the errors.

 

Thanks for listening to me complain, I feel better now. :ph34r:

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...