Jump to content
Sign in to follow this  
Followers 5
ertyu

Server Error: Connection Reset

Recommended Posts

Hey Justin,

 

The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along.

Share this post


Link to post

Thank you for taking time early and out of your Saturday to look into this and resolve it for us. Thanks for being so transparent about what the problem was and updating everyone. Thanks for the fix. Nice job. :D

 

I've reported the issue to our provider and have started waking people up. This isn't something we've seen before.

 

Does anyone know if this started before the thread was created early yesterday morning?

 

No the problem started after this thread was was posted. The OP looked into their crystal ball and knew there was going to be a problem. :blink:

 

Thanks for at least acknowledging someone is paying attention. This has been going on for closer to 36 hours now and this is the first response from Groundspeak that I have seen. Hope it gets fixed soon. Server overload maybe?

 

:anicute: Thanks for not being too hard on me... I hadn't had my coffee yet.

 

We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit.

 

After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future.

 

We're keeping a close eye, and hope we haven't impacted your ability to log #10in31.

Share this post


Link to post

It seems to be okay now, at least for me.

 

Interesting to note that I have had some double or even triple logging on a few of my caches which have been found today.

Share this post


Link to post

Hey Justin,

 

The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along.

I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues.

Share this post


Link to post
I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues.

Lucky you. For me, every click was a Connection Reset requiring 3-5 reloads to get anything. I finally gave up and it was even worse this AM. Much better now although, a bit slower than usual.

Share this post


Link to post

Hey Justin,

 

The site has been nearly unusable for the last 36+ hours. How come status.geocaching.com shows 5 minutes of downtime during that who time? I don't know how Pingdom works but it seems like they must be pinging things (or however they do it) in an odd way if the community is seeing no ability to use the site but they are saying its speeding along.

 

I'm not sure how Pingdom does it, but a typical ping utility will send a number pings in succession and then average the response times. (Windows ping utility defaults to 4). If the utility gets a response from any of the pings, the host has responded and that is probably what is being reported. The downtime that was actually reported this morning is obviously when they reset things.

Share this post


Link to post
I would not call the site completely unusable. Once I was able to get past the connection reset problem when I initially logged onto the site last night, I was able to visit any page I liked without any issues.

Lucky you. For me, every click was a Connection Reset requiring 3-5 reloads to get anything. I finally gave up and it was even worse this AM. Much better now although, a bit slower than usual.

 

You had to click your mouse five times so that made the site completely unusable? Yes it is frustrating, so much so that you gave up, but to say that it was completely unusable is an extreme exaggeration. The site was perfectly usable for those that had a bit of patience.

Share this post


Link to post

Justin,

 

Thx for the quick response, feedback and solving the issue. Greatly appreciated!

Share this post


Link to post

I too noticed that the status page was not representing the situation correctly for much longer than I would have thought normal. I wonder if the IT folks at Groundspeak might be helped with a bit more quantitative information about the specific incident this morning:

 

I was not able to access the API, nor view any cachepages, for 45 to 60 minutes after Pingdom began reporting the site to be healthy again.

 

I have marked the range on the timeline that shows green in which I could not get any geocaching.com web page to show nor API responses:

p7j.gif

 

In the time from the blue box to the next hashmark (~9 am) I was occasionally able to get the API to respond to a call without timing out.

 

After that timeframe the API began to respond but it was bogged down. I imagined 10000 geocachers suddenly realizing it was back and all hitting the service at once.

 

I would note that this screenshot was taken just now and the green bar is green all the way to now and at this moment all the services that I typically use appear healthy. Yet the icon for Aug 10 (today) is red on this page as well as on the summary page. It would seem to me that a "status" page would highlight the instantaneous condition rather than an incident since midnight.

 

So I see two issues:

1. I thought the page was a status page, and it is not for the purposes of this geocacher wanting to go caching. Now for the rest of the day the red icon will suggest to geocachers that the site is failing when it is actually is working fine.

2. This tool seems to be useful only to diagnose if the problem was so severe that the IT hardware was not working; It really cannot be trusted with the present condition at even an hourly resolution.

 

I do join in saying thanks to those who got up early this morning to resolve this issue.

Edited by Hynr

Share this post


Link to post

:anicute: Thanks for not being too hard on me... I hadn't had my coffee yet.

This I understood.

 

We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit.

 

After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future.

 

We're keeping a close eye, and hope we haven't impacted your ability to log #10in31.

Would you translate the bolded part from "techese" into English please.

For instance... "The software messed up on the servers and we need had to do a manual failover to the secondary servers. We are having the server provider look at it." If that is what happened. :)

Share this post


Link to post

:anicute: Thanks for not being too hard on me... I hadn't had my coffee yet.

This I understood.

 

We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit.

 

After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future.

 

We're keeping a close eye, and hope we haven't impacted your ability to log #10in31.

Would you translate the bolded part from "techese" into English please.

For instance... "The software messed up on the servers and we need had to do a manual failover to the secondary servers. We are having the server provider look at it." If that is what happened. :)

One of the load balancers that distributes web site traffic amongst the web servers to handle lots of site visitors was malfunctioning.

 

Or to make it simpler, piece of equipment to make internet page function go boom... :lol:

Edited by Dgwphotos

Share this post


Link to post

Or to make it simpler, piece of equipment to make internet page function go boom... :lol:

Now THAT I understand. :lol:

Share this post


Link to post

Or to make it simpler, piece of equipment to make internet page function go boom... :lol:

Now THAT I understand. :lol:

 

Go boom and make an ouchy.

Share this post


Link to post

API server is still giving me issues with GSAK refreshing caches.

 

This is at approximately 9AM ET on Sunday. (Time of this post.)

 

Thanks.

Share this post


Link to post

API server is still giving me issues with GSAK refreshing caches.

 

This is at approximately 9AM ET on Sunday. (Time of this post.)

 

Thanks.

 

Same issues here.

Share this post


Link to post

At this moment (3 hours later) I am having no trouble with GSAK using the API.

Share this post


Link to post
We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit.

 

After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future.

 

We're keeping a close eye, and hope we haven't impacted your ability to log #10in31.

I have no idea what you said :laughing: , but appreciate that it's been fixed.

Thanks !

Share this post


Link to post

I've reported the issue to our provider and have started waking people up. This isn't something we've seen before.

 

Does anyone know if this started before the thread was created early yesterday morning?

 

No the problem started after this thread was was posted. The OP looked into their crystal ball and knew there was going to be a problem. :blink:

 

Thanks for at least acknowledging someone is paying attention. This has been going on for closer to 36 hours now and this is the first response from Groundspeak that I have seen. Hope it gets fixed soon. Server overload maybe?

 

:anicute: Thanks for not being too hard on me... I hadn't had my coffee yet.

 

We traced the issues to one of our F5 BigIP LTMs. As you likely noticed, it wasn't routing traffic properly and was resulting in connection resets and the 502 Bad Gateway errors. Unfortunately, the health checks on the system didn't recognize the sub-optimal state of the primary unit and did not automatically failover to the standby unit.

 

After manually failing over to the standby unit, the problems appear to be resolved. We're pulling logs and will be submitting them to F5 support for an investigation to determine why it was failing and what can be done to prevent it in the future.

 

We're keeping a close eye, and hope we haven't impacted your ability to log #10in31.

 

I was going to suggest a DOS attack by Ashnikes. Errrr, oops, I mean Ashnike's roommate who is trying to frame him. :laughing:

 

Sorry, just being goofy. Thanks for the quick weekend response, and fix of the issue.

Share this post


Link to post

Yes, much appreciated to anyone involved. We as a community do appreciate this being fixed in a timely manner as well as on the weekend. Thanks for the explanation though most of us have no idea what you said :-). All the same it seems to be running smoothly now. Thanks

Share this post


Link to post

At this moment (3 hours later) I am having no trouble with GSAK using the API.

Posted too soon. Upon inspection the results are basically showing

<?xml version="1.0" encoding="utf-8"?>
<GetGeocacheDataResponse xmlns="http://www.geocaching.com/Geocaching.Live/data" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Status>
 <StatusCode>1</StatusCode>
 <StatusMessage>Fail</StatusMessage>
 <ExceptionDetails/>
 <Warnings/>
</Status>
etc

So, it seems that the API is responding but not delivering anything useful.

I am also noting further down in the response that the API appears to have no clue as to what my limits are:

<a:CachesLeft>2147483647</a:CachesLeft>
<a:CurrentCacheCount>2147483647</a:CurrentCacheCount>
<a:MaxCacheCount>2147483647</a:MaxCacheCount></CacheLimits>

Edited by Hynr

Share this post


Link to post

The Geocaching website hasn't been working efficiently for the past few days. It seems that this just started after they updated the site. Servers won't connect and you keep getting error messages when you try going to a different page or back. I'm sure the administration is aware of this problem and I sure hope it gets corrected soon. It takes all day to log caches when you've done a bunch.

Share this post


Link to post

I like geocaching, your web site and what Groundspeak generally does to support us very much. But in the last days and weeks it really seems to me that your web servers are not supported by hamsters as you stated but by slugs. Response times are in a range up to 30 seconds independant of the device, operating system, browser or internet connection. I hope you will recover soon. My best wishes and kind regards Gerhard_S

Share this post


Link to post

I can not connect to www.geocaching.com: The service is unavailable. Server is too busy.

 

It seems the Server Error Connection issues are back again... or is it just me?

 

 

Share this post


Link to post

I can not connect to www.geocaching.com: The service is unavailable. Server is too busy.

 

It seems the Server Error Connection issues are back again... or is it just me?

 

 

No, same problem. In our facebook-group: all the same. I hoped to get some information here at the forum.

Share this post


Link to post

I can not connect to www.geocaching.com: The service is unavailable. Server is too busy.

 

It seems the Server Error Connection issues are back again... or is it just me?

 

 

No, same problem. In our facebook-group: all the same. I hoped to get some information here at the forum.

Okay, my message seem to made "klick". The server awaked ... ;-)

Share this post


Link to post

i had the same problem.

 

it could be, that some static ips are blocked.

i changed my static ip, now it runs.

 

for kabel deutschland customers: kundenzone, einstellungen, interneteinstellungen: switch brigde mode on and than off. --> new static ip.

Share this post


Link to post

Posted too soon. Upon inspection the results are basically showing

<?xml version="1.0" encoding="utf-8"?>
<GetGeocacheDataResponse xmlns="http://www.geocaching.com/Geocaching.Live/data" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Status>
 <StatusCode>1</StatusCode>
 <StatusMessage>Fail</StatusMessage>
 <ExceptionDetails/>
 <Warnings/>
</Status>
etc

So, it seems that the API is responding but not delivering anything useful.

I am also noting further down in the response that the API appears to have no clue as to what my limits are:

<a:CachesLeft>2147483647</a:CachesLeft>
<a:CurrentCacheCount>2147483647</a:CurrentCacheCount>
<a:MaxCacheCount>2147483647</a:MaxCacheCount></CacheLimits>

 

I hve the same problem. I can#t change my ip adress. What can I do?

Share this post


Link to post

I've noticed that the site seems to have trouble maintaining lots of connections at one time. For example, if I'm looking at a number of different caches at the same time in browser tabs, the site will sometimes take quite a while to connect to a cache page after I open it, but others will open almost immediately. I suspect the load balancer is acting up again.

Edited by Dgwphotos

Share this post


Link to post

I've noticed that the site seems to have trouble maintaining lots of connections at one time. For example, if I'm looking at a number of different caches at the same time in browser tabs, the site will sometimes take quite a while to connect to a cache page after I open it, but others will open almost immediately. I suspect the load balancer is acting up again.

 

I reported this issue a few weeks ago and still have... no response from GS until now...

 

http://forums.Ground...dpost&p=5337344

 

It seems this issue is back again.

 

When i click on a geocaching link i get a brown background and have to wait...

 

fyi: other websites are fast. Only access to geocaching.com is slow.

 

Nobody else has "loading" problems?

 

Today, I opened a new IE session to www.geocaching.com -> the page hangs after loading 60% of progress.

 

I opened a new session in the same IE windows to www.geocaching.com -> loading page was as quick as 'greased lightning'...

Edited by DanPan

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  
Followers 5

×
×
  • Create New...