+ODragon Posted August 10, 2013 Share Posted August 10, 2013 Hey Justin, The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along. Quote Link to comment
+ClimbGuy Posted August 10, 2013 Share Posted August 10, 2013 Thank you for taking time early and out of your Saturday to look into this and resolve it for us. Thanks for being so transparent about what the problem was and updating everyone. Thanks for the fix. Nice job. I've reported the issue to our provider and have started waking people up. This isn't something we've seen before. Does anyone know if this started before the thread was created early yesterday morning? No the problem started after this thread was was posted. The OP looked into their crystal ball and knew there was going to be a problem. Thanks for at least acknowledging someone is paying attention. This has been going on for closer to 36 hours now and this is the first response from Groundspeak that I have seen. Hope it gets fixed soon. Server overload maybe? Thanks for not being too hard on me... I hadn't had my coffee yet. We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit. After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future. We're keeping a close eye, and hope we haven't impacted your ability to log #10in31. Quote Link to comment
+Understandblue Posted August 10, 2013 Share Posted August 10, 2013 Thanks Justin! Quote Link to comment
+kunarion Posted August 10, 2013 Share Posted August 10, 2013 The site seems perfectly lively now. Thank you! Quote Link to comment
+Yorkshire Yellow Posted August 10, 2013 Share Posted August 10, 2013 It seems to be okay now, at least for me. Interesting to note that I have had some double or even triple logging on a few of my caches which have been found today. Quote Link to comment
+Dgwphotos Posted August 10, 2013 Share Posted August 10, 2013 Hey Justin, The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along. I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues. Quote Link to comment
+ODragon Posted August 10, 2013 Share Posted August 10, 2013 I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues. Lucky you. For me, every click was a Connection Reset requiring 3-5 reloads to get anything. I finally gave up and it was even worse this AM. Much better now although, a bit slower than usual. Quote Link to comment
+Don_J Posted August 10, 2013 Share Posted August 10, 2013 Hey Justin, The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along. I'm not sure how Pingdom does it, but a typical ping utility will send a number pings in succession and then average the response times. (Windows ping utility defaults to 4). If the utility gets a response from any of the pings, the host has responded and that is probably what is being reported. The downtime that was actually reported this morning is obviously when they reset things. Quote Link to comment
+Don_J Posted August 10, 2013 Share Posted August 10, 2013 I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues. Lucky you. For me, every click was a Connection Reset requiring 3-5 reloads to get anything. I finally gave up and it was even worse this AM. Much better now although, a bit slower than usual. You had to click your mouse five times so that made the site completely unusable? Yes it is frustrating, so much so that you gave up, but to say that it was completely unusable is an extreme exaggeration. The site was perfectly usable for those that had a bit of patience. Quote Link to comment
+GeoTrekker26 Posted August 10, 2013 Share Posted August 10, 2013 Justin, thank you for catching the problem discussion in the forums and running with it. Quote Link to comment
+DanPan Posted August 10, 2013 Share Posted August 10, 2013 Justin, Thx for the quick response, feedback and solving the issue. Greatly appreciated! Quote Link to comment
+AutisticMajor Posted August 10, 2013 Share Posted August 10, 2013 Just want to note that the double logs problem occurred when posting from the c:geo app as well. Quote Link to comment
+Hynr Posted August 11, 2013 Share Posted August 11, 2013 (edited) I too noticed that the status page was not representing the situation correctly for much longer than I would have thought normal. I wonder if the IT folks at Groundspeak might be helped with a bit more quantitative information about the specific incident this morning: I was not able to access the API, nor view any cachepages, for 45 to 60 minutes after Pingdom began reporting the site to be healthy again. I have marked the range on the timeline that shows green in which I could not get any geocaching.com web page to show nor API responses: In the time from the blue box to the next hashmark (~9 am) I was occasionally able to get the API to respond to a call without timing out. After that timeframe the API began to respond but it was bogged down. I imagined 10000 geocachers suddenly realizing it was back and all hitting the service at once. I would note that this screenshot was taken just now and the green bar is green all the way to now and at this moment all the services that I typically use appear healthy. Yet the icon for Aug 10 (today) is red on this page as well as on the summary page. It would seem to me that a "status" page would highlight the instantaneous condition rather than an incident since midnight. So I see two issues: 1. I thought the page was a status page, and it is not for the purposes of this geocacher wanting to go caching. Now for the rest of the day the red icon will suggest to geocachers that the site is failing when it is actually is working fine. 2. This tool seems to be useful only to diagnose if the problem was so severe that the IT hardware was not working; It really cannot be trusted with the present condition at even an hourly resolution. I do join in saying thanks to those who got up early this morning to resolve this issue. Edited August 11, 2013 by Hynr Quote Link to comment
+ngrrfan Posted August 11, 2013 Share Posted August 11, 2013 Thanks for not being too hard on me... I hadn't had my coffee yet. This I understood. We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit. After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future. We're keeping a close eye, and hope we haven't impacted your ability to log #10in31. Would you translate the bolded part from "techese" into English please. For instance... "The software messed up on the servers and we need had to do a manual failover to the secondary servers. We are having the server provider look at it." If that is what happened. Quote Link to comment
+Dgwphotos Posted August 11, 2013 Share Posted August 11, 2013 (edited) Thanks for not being too hard on me... I hadn't had my coffee yet. This I understood. We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit. After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future. We're keeping a close eye, and hope we haven't impacted your ability to log #10in31. Would you translate the bolded part from "techese" into English please. For instance... "The software messed up on the servers and we need had to do a manual failover to the secondary servers. We are having the server provider look at it." If that is what happened. One of the load balancers that distributes web site traffic amongst the web servers to handle lots of site visitors was malfunctioning. Or to make it simpler, piece of equipment to make internet page function go boom... Edited August 11, 2013 by Dgwphotos Quote Link to comment
+ngrrfan Posted August 11, 2013 Share Posted August 11, 2013 Or to make it simpler, piece of equipment to make internet page function go boom... Now THAT I understand. Quote Link to comment
+cheech gang Posted August 11, 2013 Share Posted August 11, 2013 Or to make it simpler, piece of equipment to make internet page function go boom... Now THAT I understand. Go boom and make an ouchy. Quote Link to comment
+sloth96 Posted August 11, 2013 Share Posted August 11, 2013 API server is still giving me issues with GSAK refreshing caches. This is at approximately 9AM ET on Sunday. (Time of this post.) Thanks. Quote Link to comment
+Eximius Posted August 11, 2013 Share Posted August 11, 2013 API server is still giving me issues with GSAK refreshing caches. This is at approximately 9AM ET on Sunday. (Time of this post.) Thanks. Same issues here. Quote Link to comment
+Hynr Posted August 11, 2013 Share Posted August 11, 2013 At this moment (3 hours later) I am having no trouble with GSAK using the API. Quote Link to comment
+cerberus1 Posted August 11, 2013 Share Posted August 11, 2013 We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit. After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future. We're keeping a close eye, and hope we haven't impacted your ability to log #10in31. I have no idea what you said , but appreciate that it's been fixed. Thanks ! Quote Link to comment
Mr.Yuck Posted August 11, 2013 Share Posted August 11, 2013 I've reported the issue to our provider and have started waking people up. This isn't something we've seen before. Does anyone know if this started before the thread was created early yesterday morning? No the problem started after this thread was was posted. The OP looked into their crystal ball and knew there was going to be a problem. Thanks for at least acknowledging someone is paying attention. This has been going on for closer to 36 hours now and this is the first response from Groundspeak that I have seen. Hope it gets fixed soon. Server overload maybe? Thanks for not being too hard on me... I hadn't had my coffee yet. We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit. After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future. We're keeping a close eye, and hope we haven't impacted your ability to log #10in31. I was going to suggest a DOS attack by Ashnikes. Errrr, oops, I mean Ashnike's roommate who is trying to frame him. Sorry, just being goofy. Thanks for the quick weekend response, and fix of the issue. Quote Link to comment
+Dale n Barb Posted August 11, 2013 Share Posted August 11, 2013 Yes, much appreciated to anyone involved. We as a community do appreciate this being fixed in a timely manner as well as on the weekend. Thanks for the explanation though most of us have no idea what you said :-). All the same it seems to be running smoothly now. Thanks Quote Link to comment
+SchmeitzM en KoolenL Posted August 11, 2013 Share Posted August 11, 2013 I have still problems now with the api connections. GSAK and iCaching Quote Link to comment
+Hynr Posted August 11, 2013 Share Posted August 11, 2013 (edited) At this moment (3 hours later) I am having no trouble with GSAK using the API. Posted too soon. Upon inspection the results are basically showing <?xml version="1.0" encoding="utf-8"?> <GetGeocacheDataResponse xmlns="http://www.geocaching.com/Geocaching.Live/data" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <Status> <StatusCode>1</StatusCode> <StatusMessage>Fail</StatusMessage> <ExceptionDetails/> <Warnings/> </Status> etc So, it seems that the API is responding but not delivering anything useful. I am also noting further down in the response that the API appears to have no clue as to what my limits are: <a:CachesLeft>2147483647</a:CachesLeft> <a:CurrentCacheCount>2147483647</a:CurrentCacheCount> <a:MaxCacheCount>2147483647</a:MaxCacheCount></CacheLimits> Edited August 11, 2013 by Hynr Quote Link to comment
+BiHuPaWa2 Posted August 12, 2013 Share Posted August 12, 2013 The Geocaching website hasn't been working efficiently for the past few days. It seems that this just started after they updated the site. Servers won't connect and you keep getting error messages when you try going to a different page or back. I'm sure the administration is aware of this problem and I sure hope it gets corrected soon. It takes all day to log caches when you've done a bunch. Quote Link to comment
+gerhard_s Posted August 13, 2013 Share Posted August 13, 2013 I like geocaching, your web site and what Groundspeak generally does to support us very much. But in the last days and weeks it really seems to me that your web servers are not supported by hamsters as you stated but by slugs. Response times are in a range up to 30 seconds independant of the device, operating system, browser or internet connection. I hope you will recover soon. My best wishes and kind regards Gerhard_S Quote Link to comment
+DanPan Posted August 17, 2013 Share Posted August 17, 2013 I can not connect to www.geocaching.com: The service is unavailable. Server is too busy. It seems the Server Error Connection issues are back again... or is it just me? Quote Link to comment
+Alte Lady Posted August 17, 2013 Share Posted August 17, 2013 I can not connect to www.geocaching.com: The service is unavailable. Server is too busy. It seems the Server Error Connection issues are back again... or is it just me? No, same problem. In our facebook-group: all the same. I hoped to get some information here at the forum. Quote Link to comment
+Alte Lady Posted August 17, 2013 Share Posted August 17, 2013 I can not connect to www.geocaching.com: The service is unavailable. Server is too busy. It seems the Server Error Connection issues are back again... or is it just me? No, same problem. In our facebook-group: all the same. I hoped to get some information here at the forum. Okay, my message seem to made "klick". The server awaked ... ;-) Quote Link to comment
+drikolaus Posted August 18, 2013 Share Posted August 18, 2013 i had the same problem. it could be, that some static ips are blocked. i changed my static ip, now it runs. for kabel deutschland customers: kundenzone, einstellungen, interneteinstellungen: switch brigde mode on and than off. --> new static ip. Quote Link to comment
+tschuse Posted January 24, 2014 Share Posted January 24, 2014 Posted too soon. Upon inspection the results are basically showing <?xml version="1.0" encoding="utf-8"?> <GetGeocacheDataResponse xmlns="http://www.geocaching.com/Geocaching.Live/data" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <Status> <StatusCode>1</StatusCode> <StatusMessage>Fail</StatusMessage> <ExceptionDetails/> <Warnings/> </Status> etc So, it seems that the API is responding but not delivering anything useful. I am also noting further down in the response that the API appears to have no clue as to what my limits are: <a:CachesLeft>2147483647</a:CachesLeft> <a:CurrentCacheCount>2147483647</a:CurrentCacheCount> <a:MaxCacheCount>2147483647</a:MaxCacheCount></CacheLimits> I hve the same problem. I can#t change my ip adress. What can I do? Quote Link to comment
+Dgwphotos Posted January 30, 2014 Share Posted January 30, 2014 (edited) I've noticed that the site seems to have trouble maintaining lots of connections at one time. For example, if I'm looking at a number of different caches at the same time in browser tabs, the site will sometimes take quite a while to connect to a cache page after I open it, but others will open almost immediately. I suspect the load balancer is acting up again. Edited January 30, 2014 by Dgwphotos Quote Link to comment
+DanPan Posted January 30, 2014 Share Posted January 30, 2014 (edited) I've noticed that the site seems to have trouble maintaining lots of connections at one time. For example, if I'm looking at a number of different caches at the same time in browser tabs, the site will sometimes take quite a while to connect to a cache page after I open it, but others will open almost immediately. I suspect the load balancer is acting up again. I reported this issue a few weeks ago and still have... no response from GS until now... http://forums.Ground...dpost&p=5337344 It seems this issue is back again. When i click on a geocaching link i get a brown background and have to wait... fyi: other websites are fast. Only access to geocaching.com is slow. Nobody else has "loading" problems? Today, I opened a new IE session to www.geocaching.com -> the page hangs after loading 60% of progress. I opened a new session in the same IE windows to www.geocaching.com -> loading page was as quick as 'greased lightning'... Edited January 30, 2014 by DanPan Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.