iryshe Posted June 23, 2002 Share Posted June 23, 2002 It seems that someone created an application that effectively shut down the services of the web site for 2+ hours (not sure the exact damage yet). It hit the cache pages at 100 times a second. I will be handling this as a Denial of Service attack and have already contacted the ISP. I consider this a serious matter. I did shut down the IP address so the site could run. Just in case you had issues today, this is why. I'm considering some throttling program to keep this from happening in the future. I was close to making the site registration only after this one Jeremy Jeremy Irish Groundspeak - The Language of Location Link to comment
AlphaOp Posted June 24, 2002 Share Posted June 24, 2002 Oy, some people... I hope that in the future ISPs will also take a lot more responsibility to prevent DOS attacks and punish people who start DOS attacks instead of laying responsibility on the people who'se sites are attacked. Odd, why the heck would they attack Groundspeak, if not to just (explitive) with people... CODENAME: ALPHA OPERATOR daedalus://govlink/secure/majestic/12.12.12/ops/throne/AO MAJESTIC-12: THRONE G6 LEVEL AGENT http://www.planetdeusex.com Link to comment
+Allen_L Posted June 24, 2002 Share Posted June 24, 2002 Unfortunately there are some fairly nasty DDOS attacks that throttling programs and requiring registration currently couldn't stop. They take advantage of inherent weaknesses in TCP/IP. See http://grc.com/dos/grcdos.htm for a report of one such attack. He doesn't tell how to do it, but what he found out about it and how he worked with his ISP to stop it. Hopefully no one will do them to geocaching.com and if they do that you can have the ISP's block the packets. Link to comment
+sbell111 Posted June 24, 2002 Share Posted June 24, 2002 Everything you said went immediately over my head, but I fully support any actions you must take to stop assaults on the site. Link to comment
Team Dragon Posted June 24, 2002 Share Posted June 24, 2002 If Gibson had a clue about security, he wouldn't have had the problem after the first attack. The attack that he was hit with, multiple times, has nothing to do with a flaw in TCP/IP. The "bogus port 666" is a legitimate port used by ID software. I certainly don't want to go off on a multipage rant here where more people don't care. Instead, any interested parties should check out http://www.grcsucks.com/ where there are links to dozens of articles debunking his methods. Anyway, glad to hear that the site's up and things are being looked at. Good luck! Link to comment
robertlipe Posted June 24, 2002 Share Posted June 24, 2002 While clearly any page grabber that's generating 100 page requests a second is being a rude dog, please be careful to not "oversolve" this problem. Specifically, please don't implement any kind of throttling thing that makes it painful to grab enough sheets to do a road-show. It's not hard to need to look at a couple hundred pages to figure out what caches may practical to visit during an extended trip. Additionally, the very layout of the pages means that many pages have to be served two or three times even to human users that are ripping through the pages. That would also complicate hard rules on page counts or requests per minute. Perhaps guidelines for power users and robots suggesting an N second delay between requests, requesting no more than N per second, suggesting specific "off-peak" times, and so on would help everyone involved. Link to comment
+Team Hoijong Posted June 24, 2002 Share Posted June 24, 2002 I don't know much about security but i recommend that the Geocaching website and the whole forum needs to be backed up very frequent.. I thinkit would be really a good idea to backup the whole geocaching community daily.. Because when there will be a attack from hackers that will cause data loss it will only be a data loss that is minimal because we can always get back to the backed up data.. I'm not a computer expert.. but I have seen www.ikonboard.com beeing hacked.. Ikonboard is a free forum for your website.. they have been hacked a few weeks ago.. and lost all.. their posts in the forum.. i think more then 40.000 posts were lost.. Security is really needed on a big community like the geocaching community.. Happy cachin' Greetings Irresisti N12º 55.475 E100º 52.865 Link to comment
+mrcpu Posted June 24, 2002 Share Posted June 24, 2002 DOS attacks can be very difficult to stop. I use ethereal (www.ethereal.com), a freeware packet sniffer to see exactly what is happening on the wire. Give it a try! Rob Mobile Cache Command Link to comment
+konopapw Posted June 25, 2002 Share Posted June 25, 2002 The 'DOS Attack' you speak of was not intentional. The fact is that it was an improperly-scheduled cron job that runs Perl script to 'mine' geocaching information related to a couple dozen user profiles. The problem was caused by a job being executed every minute, instead of every day. Because each job ran longer than 1 minutes, each subsequent execution was contenting for the same geocaching.com resource, thus causing deadlocks in the SQL Server database. (NOTE: the ISP has not allowed the abililty to monitor cron jobs using telnet, so there was no good way to see a problem mounting.) Luckily, the geocaching.com network discovered the problem and shut off access to the IP address. The offending cron job was then removed and Admin@Groundspeak.com contacted, explaining the situation. (Note: there has been no subsequent reply from Admin@Groundspeak.com) The host ISP of the cron job is aware of the problem that occurred and is aware it was a mistake - not an intentional 'attack'. This was a programming mistake by a single human being. It was not a hacker attach or a purposeful attempt to slowdown or shutdown geocaching.com (that would be sacriligious!). This situation prompts discussion about data accessibility, either by third-party queries or more data retrieval features provided by geocaching.com. There are obviously other entities 'mining' data from the web pages and I DON'T want to see that taken away (it's part of what makes the geocaching community strong) Link to comment
+Dekaner Posted June 25, 2002 Share Posted June 25, 2002 It seems there are some very good sites (that I admit using) that mine information off of the geocaching.com site. To prevent situations like this in the future, would it be advantagous (sp?) to set up some sort of association program where you fill out a form online asking for permission. There you could outline what you want to do, how you want to do it, etc. Perhaps some of these ideas could eventually be incorporated into the geocaching.com site? - Dekaner of Team KKF2A Link to comment
+konopapw Posted June 25, 2002 Share Posted June 25, 2002 The suggestion by 'robertlipe', an 'N' second delay, would spread out the server load for the non-human processes hitting the site (regardless of how efficient they are ...). If they didn't want to have to program their network for it, the delay would at least be a 'mining' guideline. For those geocaching groups mining data, I'm sure they have spent a lot of time figuring out how to parse and update their own databases with the HTML pages at geocaching.com. It would be so much more efficient to allow certain users to log into the database and run some queries to get the data straight from the source. However, I understand the value of their data and if they aren't comfortable with the security of that resource, they aren't going to allow that to happen. (Would save a lot of time and resources, though!) My other suggestion would be some parameter-driven forms on geocaching.com that automated systems could post parameters to and get a report dump of the data they are looking for. Then they can mine the data from a single HTML page, instead of convoluted logic to navigate the site as a human would. This way, they could provide relevent data on their group's web site with limited programming and limited resource use Link to comment
robertlipe Posted June 25, 2002 Share Posted June 25, 2002 quote:Originally posted by konopapw: The suggestion by 'robertlipe', an 'N' second delay, would spread out the server load for the non-human processes hitting the site (regardless of how efficient they are ...). It's certainly easy enough to do. Ok, so fetching the caches for my trip takes two hours instead of two minutes. It'll still be there when I need it and it didn't hammer geocaching.com in the process. Furthermore, caching the sheets is easy but then you have to deal with potentially stale log data or descriptions/coords. Again, it's a familiar and a solvable problem. quote:For those geocaching groups mining data, I'm sure they have spent a lot of time figuring out how to parse and update their own databases with the HTML pages at geocaching.com. It would be so much more efficient to allow certain users to log into the database and run some queries to get the data straight from the source. As a programmer, I can speculate what most of those "miners" are doing with the data. I'll venture that most of them are spending more clock cycles undoing the real-time 'cooking' of the data that's presented as HTML than they are actaully doing what they want to do with the resulting data. If there was either a "raw" version that was accessible or a precooked set that could belivered by a static server (one big writev() on the socket instead of the ASP stuff) it'd be less stressful to those involved. For static pages, 100 requests a second can be served up by a really wimpy computer. quote:report dump of the data they are looking for. Then they can mine the data from a single HTML page, instead of convoluted logic to navigate the site as a human would. There's a cruel irony that machine-reading them is made harder becuase they have to be read like a human would read them from the computer that's working harder to write them for a human. Link to comment
+parkrrrr Posted June 25, 2002 Share Posted June 25, 2002 quote:Originally posted by konopapw:The 'DOS Attack' you speak of was not intentional. Jeremy didn't say it was intentional; I get the impression that he knew it was a data-mining application run amok. He didn't say it was a DoS attack, just that he was handling it as one. quote:This situation prompts discussion about data accessibility, either by third-party queries or more data retrieval features provided by geocaching.com. There are obviously other entities 'mining' data from the web pages and I DON'T want to see that taken away (it's part of what makes the geocaching community strong) Speaking as someone who automatically mines data from geocaching.com myself, I think that what this incident really prompts is discussion about the responsibilities of those of us who mine data from geocaching.com: When you change anything about your application, including its scheduling, test it with your own website first, before pointing it at someone else's site. Log everything. Watch the first few executions and make sure nothing unexpected happens. If it's a cron job, have it email you the logs and make sure you watch your email for a while. When you mine data from geocaching.com, make sure you are getting the bare minimum data you need. Don't download the entire cache page with full unencrypted logs if all you want is the 'tudes and the cache name. Learn to read the EasyGPS data instead. Make sure you comply with the geocaching.com terms and conditions. Among other things, that means not redistributing any data you harvested from geocaching.com, so your application should be for your personal use only. If, while debugging your application with a live connection to geocaching.com, you discover any problems with geocaching.com itself, make sure to notify the admins. For example, I found a potentially exploitable hole in a script while debugging my own mining application, and reported it immediately. As it turned out, the admins had already seen it themselves and were in the process of fixing it, but it's better to be safe than sorry. Link to comment
+mrcpu Posted June 25, 2002 Share Posted June 25, 2002 "If I were Jeremy..." If we take a look at Microsoft, Novell, and many other large and successful companies, one thing we see is that they grow through partnerships and not by locking people out. Microsoft is a great example. They provide platforms and tools for development. They sign people up to be partners and in the end you have a situation where the majority of windows software isn't made by Microsoft and yet because they own the platform, they reap the benefits indirectly. I have seen several "other" geocaching related sites that have to work hard to scrape the pages to get information. A couple of things on geocaching.com have been duplicated effort AFTER these other sites have done fairly well. IF Geocaching.com were to create a partner program, they could lock parners into a license agreement. The partner agreement would give access to specific data in a direct format such as .NET (or a variant) or at the extream, read access via SQL directly. These partners would then provide all the functionality that Jeremy has had to sweat over. Instead of spending time fighting off people who scrape the pages and cause problems, Jeremy could forge alliances with them. To make an example, the stats pages could be a partner. The load caused on the geocaching server by providing a direct link for the stats page author to get the stats would be about 2% of what it probably is now with their program scraping the HTML. While Jeremy COULD create his own stats page, he would get it for FREE by cooperating with someone like this in an official Parter agreement. In addition the bandwidth used by people viewing their stats would be offloaded to the partner site, thereby saving Geocaching.com again. I've thought I'd like to set up a geocaching teams site. If I had real access to the data it would be a whole lot easier. In a partnership with Jeremy I would be open to working with him to link back to the geocaching site wherever it made sense. Another example that has turned up in other forums is the need for localization. There are lots of other countries besides the USA and more people speak other languages then English. As a result there are other regional sites out there that could seriously benefit from a partner system. One thing I'm definatly NOT suggesting is that Jeremy do anything that would cause him to lose control of the data. In order to create a GC someone would still have to come to the geocaching.com site. In addition the partner agreement would legally prohibit pirating the data. (Lets face it, even IF someone did grab a copy of the geocaching.com data it wouldn't be much use because everyone would still go to the geocaching.com site anyways!) Bottom line is that if Geocaching.com were to follow in the steps of some of the most successful companies in the world and create a partner system, it would increase international growth, promote the creation of additional features and reduce workload and bandwidth usage for geocaching.com and Jeremy. Geocaching.com could become greater then it already is through partner sites. Rob Mobile Cache Command Link to comment
+konopapw Posted June 25, 2002 Share Posted June 25, 2002 Partnerships - what a great idea! Especially since it carries with it a win-win attitude. Perhaps the Groundspeak Steering Committee has already considered it. I think the statement 'Groundspeak provides the site and geocachers provide the data' will continue to be true. That, in itself, is a partnership. Geocaching groups are trying to provide localized, value-added information for their members. I totally respect Groundspeak's right to guard the data, but building a wall around it would limit the potential for everyone. This thread lead with a burn from Jeremy about making this a member's only site. Hopefully, that statement was not meant to be an indication of the direction Groundspeak is taking its site. Link to comment
+Dekaner Posted June 25, 2002 Share Posted June 25, 2002 I think mrcpu read my mind. That's exactly what I was thinking! - Dekaner of Team KKF2A Link to comment
+infosponge Posted June 26, 2002 Share Posted June 26, 2002 I agree with you 100% mrcpu...but I think we can all tell by past experience that this will never happen. OK, never say never, but it seems pretty unlikely. Link to comment
Eric O'Connor Posted June 27, 2002 Share Posted June 27, 2002 quote:Originally posted by konopapw:This thread lead with a burn from Jeremy about making this a member's only site. Actually, what he said was "I was close to making the site registration only after this one". Membership requires more than the email address that registration requires. Link to comment
+Sun Chasers Posted July 10, 2002 Share Posted July 10, 2002 Partnerships sound great, but is that logistical overkill for the problem at hand? How about a mirror site with FTP access? Mine it all you want. Just my 2 cents. Link to comment
teamwsmf Posted July 12, 2002 Share Posted July 12, 2002 quote:Originally posted by Legal Tender Cache: Partnerships sound great, but is that logistical overkill for the problem at hand? How about a mirror site with FTP access? Mine it all you want. . Yep, and since its READ ONLY (ie NO data can be defiled on the source GC db) it can even be CheckSummed or PGP signatured to shows users of the mirror its authentic from the motherload GC data (once again for any folks wnating to defile the purity of the gc data) Mirror sites have worked well for a number of other projects, cosnider the likes of Project Gutenberg, TuCows, the old Walnut Creek sites, games sites that mirror company patch files and info...etc etc In the age of the net one of the key leasons to learn is there are resources and people who you can use to help a project...ie playing nice with others pays off for everyone. -tom ---------------------------- TeamWSMF@wsmf.org Link to comment
+infosponge Posted July 13, 2002 Share Posted July 13, 2002 I think the stumbling block to providing a mirror site would be that anyone could then set up a competing geocaching site using the data that Team Jeremy has been so lovingly collecting and caretaking for the past few years. What I would like to see is a "data subscription" where someone could license access to the GC.com database through some API (SOAP, or .NET or whatever the vogue is these days). This means Groundspeak could control who got the data, how much they were allowed to get, and what they did with it. This seems like a fair exchange for all involved. Groundspeak retains some semblance of control over the data, and they get compensated. The people who want a data feed for a legitimate non-competitive purpose could get it. If this existed, we'd have had Palmable geocaching for everyone many months ago, because I would have glady bought a license and made it work. I can see a lot of add-ons that people could develop that would work this way...setting up "team" statistics with competitive rankings, personalized maps, regional activity summaries, all sorts of great and useful things. Link to comment
+mrcpu Posted July 13, 2002 Share Posted July 13, 2002 have a friend who has been downloading and converting .loc files in his area to another format an maintaining this on his website. He emails people once a week when he updates it. His is NOT competing with Geocaching.com in any way, shape or form.When someone uses this list they can import the data to REAL mapping program with actual detail (not like the goecaching.com map applet). When they use this list, if their map program permits it, they can click on a waypoint and be taken to the Goecaching page for that Cache. Recently my friend recieved an email from geocaching.com telling him to stop providing this list! Now, my question is, it what way is he competing with Geocaching.com? Is it because of the new members only feature of bulk downloads? Does it not make sense to let him, a non-member, to continue his efforts? In my mind, his useage is very legitimate and non-competitive! Rob Mobile Cache Command Link to comment
+infosponge Posted July 13, 2002 Share Posted July 13, 2002 I think the position is that if you're providing a feature/service that Geocaching.com *could* provide using the geocaching database, then it's competing, even if gc.com isn't providing that service or feature currently. Essentially, you can do whatever you want, you just can't mine any geocaching.com data to do it. Link to comment
+Sun Chasers Posted July 13, 2002 Share Posted July 13, 2002 To me it sounds like he was providing a service for *free* which GC.com provided to members, dirctly competing with a licensed service. Nonetheless, try as you may to protect your data with such contrivances and ultimately there is someone who can and will break in just because "it was there", and your final defense will rest in the court room, not at the firewall. Look at DVDs and DivX , CD copyrights [overridden with a magic marker], WindowsNT [backOrrifice], and the list of failed data protection schemes goes on indefinately. I'm not saying there is no way to secure your data, but the administrative cost of doing so for a database with (as of yet) so little commercial value is self defeating and the burden would be on us geocachers to fund it. If you protect your data with a well written copyright then you can protect yourself against any competition (except Microsoft, of course) stealing it with the least administrative overhead. I still support the old school ftp mirror site which has been a cost effective means of data distribution for years. As long as you own the data and it doesn't stand to cost you millions if compromised, let the intellectual property laws handle it. The system's in place and our tax dollars are paying for it weather we use it or not, so USE IT! Link to comment
+infosponge Posted July 14, 2002 Share Posted July 14, 2002 Intellectual Property laws, like patents, are only as good as your willingness to defend your claim in court. If it isn't already, having to deal with laywers on a daily basis would definitely put running geocaching.com into the "just not fun anymore" category. Link to comment
+Sun Chasers Posted July 14, 2002 Share Posted July 14, 2002 quote:Originally posted by infosponge: Intellectual Property laws, like patents, are only as good as your willingness to defend your claim in court. If it isn't already, having to deal with laywers on a daily basis would definitely put running geocaching.com into the "just not fun anymore" category. Careful now, Karies a paralegal Link to comment
Recommended Posts