Jump to content

Justin

Admin
  • Posts

    67
  • Joined

  • Last visited

Everything posted by Justin

  1. Starting around 10:30pm PST, the Geocaching.com website experienced authentication issues that severed browser-based and token authorized apps (GSAK). Mobile apps continued to function during this time. Our engineers worked diligently to provide a hotfix and will continue to monitor closely. We apologize for the inconvenience.
  2. We've seen routing issues in the past, but I'm unable to determine if this is similar with the limited information provided. It's also possible that this will resolve on its own or related to maintenance somewhere along the route. First, I would verify that DNS is working properly. We've had issues in the past where the ISP provided DNS has issues resolving and switching them out for OpenDNS (208.67.222.222) or Google DNS (8.8.8.8) was a viable solution. If this is an issue of Kpn utilizing a sub-optimal route through one of our peers, it would be beneficial to provide traceroute data from both the 4G and Kpn connection. If you're not familiar with traceroute, you can visit http://tracer01.Groundspeak.com/ and it will automatically hit our endpoint and log your traceroute as well. Feel free to post the output in this thread for others, but obfuscate the last couple octets of your IP if you don't want to share that publicly.
  3. Thank you. I submitted a copy of the message through Help Center, along with the full IP. I categorized it under Bug Reports. Appreciate the help! -Jason Unfortunately, we're only able to manipulate the outbound route from our infrastructure. If you're having issues reaching us, there might not be anything that we can do. We have a clear route outbound to your ISP, currently using our BGP peer NTT. I can't directly reach your modem, but that's likely intentional and ICMP is disabled. If you want to run MTR (or WinMTR) to www.geocaching.com (63.251.163.200), I can review it for you if you reply to the Help Center thread or want to post it here. Feel free to obfuscate the first couple hops. Have you tried to use alternative DNS? Sometimes we see issues like this and switching from the default ISP provided DNS to Google (8.8.8.8) or OpenDNS (208.67.222.222) will do the trick.
  4. Services should now be back online. I plan to visit the pet store later today for a longer term solution.
  5. An elevation given in meters. Hans That was the obvious answer, but "on 3200m" made me question if it was some cellular provider that I wasn't familiar with. Hence, why I asked for clarification.
  6. gc.com slowness shouldn't affect this since the photos are now hosted by a third party, no? Yes, that cloudfront URI is the caching service we use in front of the Amazon S3 buckets that host the images. His ISP has been having issues with GTT interchange bandwidth, so it's possible access to those endpoints is affected also.
  7. That's a different issue than what the others in the thread have experienced. Does this occur at all times during the day or is it isolated to a certain window? What is 3200m?
  8. I've submitted your CIDR (217.162.128.0/17) for the work around to avoid GTT on the route from our infrastructure back to your client. Hopefully this will be in place in time to test later today.
  9. This has been noted, but it's only visible when browsing the site via SSL. The script provided by our job board software is called with a non-SSL HTTP:// source reference and the browser is hiding non-SSL content. If the job board posting script is compatible, we'll get it updated shortly.
  10. I rebooted it right around noon PDT. I don't have much experience with supporting Wherigo, so I'm not sure that we'll get far assessing a root cause. Prior to this, it appeared that server had been operating without issue since at least late March.
  11. While not usually a problem on this site, there ARE times when the "Waiting for..." in the lower left corner indicates that it is not geocaching.com that is holding up the show, but rather, a 3rd party site is creating the issue.. Part of the way that you can help in avoiding this is organizing web pages such that all of your own primary content will be loaded before any 3rd party content that may, for whatever reason, have severe latency issues. There have been times where even the google analytics site has been so slow that I've relegated it to 127.0.0.1 in my hosts file just to get pages to load in a timely fashion. Since the analytics redesign, google isn't nearly the sort of problem it once was, but it can still cause issues, and is only one of many sites that web designers seem to have to 'preload' before their content is completed. I'm not a web developer, but there might be some pagespeed prioritization tasks beneficial to the site. Waterfalls show Google Analytics content loading fairly late in the process from my observation, but perhaps it can be deferred until all content is loaded. Thanks for bringing this up, I'll share it with the appropriate team.
  12. I see these routing issues from KabelBW in Germany too - basically the whole of the Geocaching web service, including API calls from GSAC, is unusable after around 8pm CET. It's meaning I have to make sure I have everything done in the morning that I need for evening caching trips, and is also forcing me to log in batches when I have a morning free :-( I can request a shunt for the 109.192.0.0/15 network, but can you please provide me with the output of http://tracer01.Groundspeak.com during a slow period? Feel free to remove your specific IP.
  13. Wow... it's 17:53 and everything's quite fast - didn't have this for weeks. Will keep monitoring... Good stuff! I suspect the issue is related to the work being done by GTT represented in this article: http://blog.streamingmedia.com/2015/06/isps-not-causing-network-slowdowns.html Our ISP's peering technology is failing to dynamically prioritize the optimal route, so in this instance we are manually avoiding ATT, Cogent and GTT. It's quite cumbersome to get these shunts in place, but I'll continue to do so as necessary.
  14. I don't believe your issue is related to what the others in this thread are experiencing. Images are hosted via Amazon S3 and backed by the Cloudfront CDN service to cache them at different edge locations. The Google Analytics code has been present for several years. I would suggest that you try using the latest Chrome and Firefox browsers to see if the problem persists. If it does, I would modify your DNS settings from the default that your ISP provides to either 8.8.8.8 (Google Public DNS) or 206.67.222.222 (OpenDNS). Give that a try and if you still continue to have problems, submit a ticket to the CM team and feel free to reference that I helped you in the forums. http://support.Groundspeak.com/index.php?pg=request
  15. Packet loss at a single hop in the trace with no loss after it is not indicative of a fault. The router is doing its job, passing packets back and forth. Responding to traceroutes is right at the bottom of its priority list. If it has better things to do (like, say, routing packets) then your traceroute gets ignored. The fact that there is zero loss after that hop confirms that there is not a problem there at all. The packets are getting through just fine. If the loss started at one hop and continued in multiple subsequent hops, then it would indicate a problem. EngPhil is correct. What you're seeing here is ICMP deprioritization and it has caused a lot of confusion for users trying to troubleshoot the issue from their end. When "packet loss" is present at one hop and doesn't persist from source to destination, it means that that router had more important obligations and was saving its resources for its primary networking functions. We need to gather data from users during the slowdowns so we can identify which of our 7 available BGP peers is responsible for this problem. You can participate by providing traceroute data from our network back to your client by visiting http://tracer01.Groundspeak.com In the case of Deutsche Telekom users, we were able to identify poor experiences using ATT or GTT. I suspect we might find a similar pattern with users on your ISP, but we need to collect more samples to be sure. There are some recent publications that could possibly address what we've seeing: http://blog.streamingmedia.com/2015/06/isps-not-causing-network-slowdowns.html http://money.cnn.com/2015/06/25/technology/slow-internet/
  16. Tracing from your client to our infrastructure does not appear to be fruitful in this situation. Please run a trace from our network to your client IP by visiting http://tracer01.Groundspeak.com during a slow period. You can either reply with the output in this thread or if you prefer to not share your IP, submit it to the Customer Management team via http://support.Groundspeak.com/index.php?pg=request with ATTN: Justin. Once I've gathered more data, I'll request shunts for your network/ISP and you can determine if the situation has improved.
  17. I've requested a shunt to avoid ATT, Cogent and GTT on your CIDR. This should be active now, so please let me know if your experience has improved on your next attempt at browsing the site during a previous slow period. 178.82.0.0/16 will now only use NTT, Qwest, XO or Zayo. You can view your outbound route from our infrastructure by visiting http://tracer01.Groundspeak.com I'm preparing a post so we can start collecting more evidence of the slowdown and identify which peer(s) is responsible for those of you affiliated with UPC.
  18. The issue is not related to server or infrastructure load based on the metrics we capture. It is a tier-1 ISP routing issue. This can be observed when users with the issue modify their route using VPN services or utilities like ZenMate, and the site becomes snappy and responsive. When the problem first manifested, it was primarily among Deustche Telekom users. We had asked users to submit their ping and traceroute data in hopes of isolating the network or hops responsible, and those client to Groundspeak traces seemed to indicate that NTT's network could be the issue. It wasn't until we were able to setup a test client on a Deutsche Telekom DSL network and run bi-directional traces on our own that we started seeing a pattern that eventually caught the attention of our provider. Our provider, Internap, uses a blended BGP solution for our internet access which consists of 7 tier-1 peers (ATT, GTT, NTT, XO, Zayo, Cogent and Qwest). A technology they employ called MIRO provides dynamic optimization of all outbound connectivity and constantly evaluates the best peer for a route. To test each peer individually, we asked Internap's network engineer to effectively disable MIRO for the IP range of the test client setup on DT, and then specify one peer network at a time. After establishing a baseline of normal performance on all 7 peers, we continued troubleshooting during a recent 18:00-20:00 CET poor performance window. These tests ultimately yielded evidence that connections routed across ATT or GTT were more prone to performance problems. So for Deutsche Telekom, that led to a workaround on our side to never use ATT and GTT for those client networks, and that is hopefully showing improvement for the vast majority of DT users. Specific IP details are available here. Within the last few days, we've seen a pattern of new ISPs and locations reported and you are obviously included in that. The list we have compiled includes Cablecom (Switzerland), UPC (Ireland) and Ziggo (Netherlands). However, after doing more research, it appears that Cablecom and Ziggo have affiliation with UPC, so it appears this new round of reports have a common link. Since we don't have the luxury of a test box on one of these networks, it might take a little more time to reach a solution but this issue has been made a priority. Considering the similarities with the DT issue, ATT and GTT could very well be introducing the problems that you're having and excluding those peers for your network range might resolve the problem. We will have to identify the possible CIDR network ranges in use by your provider to apply the workaround, as well as verify the peer in use when the problem is observed.
  19. From what we've observed recently, the routes from our infrastructure to the client are actually more valuable for troubleshooting this issue. Outbound connections over NTT are typically quite good, and with regard to the Deutsche Telekom users previously having issues in this thread--we were able to determine that connections over GTT and ATT were problematic. A workaround has been put in place to avoid those peers for the DT network ranges we've observed complaints on, but it has to be applied via CIDR notation for each network specified. Right now we've applied the workaround for 7 of the largest CIDRs announced by DT, which covers ~25 million of the 34.3 million addresses announced via RIPEstat. Those ranges currently avoiding ATT and GTT are: 79.192.0.0/10 (79.192.0.0 - 79.255.255.255) 84.128.0.0/10 (84.128.0.0 - 84.191.255.255) 87.128.0.0/10 (87.128.0.0 - 87.191.255.255) 91.0.0.0/10 (91.0.0.0 - 91.63.255.255) 93.192.0.0/10 (93.192.0.0 - 93.255.255.255) 217.0.0.0/13 (217.0.0.0 - 217.7.255.255) 217.224.0.0/11 (217.224.0.0 - 217.255.255.255) 217.80.0.0/12 (217.80.0.0 - 218.95.255.255) Are you observing this slowness around the window of 18:00-20:00 CET or can you give an estimate? Do you know if your cablecom.ch IP is static or typically operates in the same CIDR? RIPEstats for your ISP do not appear complete, so I would have a hard time isolating possible networks for a workaround based on that information.
  20. Thanks for the response. As a customer of theirs, would you be willing to open a ticket with Deutsche Telekom and see if they can help you work the issue on their end? I've had little success reaching out to ISPs that I am not a customer of. Also, Rhapsody hosts their music catalog in our datacenter so it should traverse the same routing path. I haven't seen anyone respond to my request to browse their catalog and report if it experiences similar slowness. That site is http://origin.rhapsody.com/browse/ It's clear that ZenMap and other VPN solutions provide a workaround, but we would like mitigate that requirement.
  21. Thank you all for providing the situational data in this thread. I've compiled what's been submitted here, Facebook, Twitter and conversations I've had with few of you directly so I can effectively communicate the situation with our provider. I appreciate all of your time and effort, and am terribly sorry to hear that your experience accessing our resources has been degraded in recent weeks. Painting a picture from all of the information provided, I believe ecanderson and cezanne are on the right track with there being a tier 1 ISP routing issue affecting folks in Germany and possibly other parts of central Europe. This is based on posted traceroute (tracert) output of users experiencing dramatically increased latency on middle hops and even packet loss. In all the examples that I've seen so far, the dramatic latency increases are occurring after your provider hands the packet off to a tier 1 ISP, but before it reaches our provider. So neither of our providers' networks appear directly responsible, but some of the internet backbone peers that both of our providers rely on could be. Of the slow instances that I've reviewed, the traceroute output frequently shows routes that include tier 1 network providers Cogent and NTT. I found a relevant article on backbone/peering concerning Netflix that shows Cogent and NTT with higher latency than other providers and talks about the effect of slowing during peak times. Cogent also appears to have engaged in Quality of Service traffic deprioritization that might be worth noting. Likely the best and only course of action is to draw attention to the routing issue and hope it gets resolved soon. I've initiated dialog with the network operations center of our provider, Internap, which you'll see in the last couple of hops of the traceroute output that folks have posted. I know that we utilize several BGP peers, which appear to include Cogent and NTT based on network hop naming. While many ISPs have agreements with these providers as BGP peers, it doesn't indicate if that agreement is direct or third-party, but they might have some influence in getting this resolved. I have asked for guidance on this situation and what options are available to us. With all that said, I would still like to continue collecting more positive and negative experience data from end users in this thread. For those of you including time without specified time zones I'm assuming those are UTC. Since some of you have reported slow load times on Geocaching as well as the Discussion Forums, I would anticipate that load times across all resources at our Seattle Internap facility would yield similar results. It would be helpful to know if the performance is the same on both sites while logged in or browsing as an unathenticated guest. For those that mentioned accessing and tracing seattle.gov when Geocaching was slow to respond, might I suggest tracing origin.rhapsody.com and navigating through their music catalog. Rhapsody is hosted in our space and should have a route much closer to our own. Traceroute output from positive experiences like UK user MartyBartfast reported, or anyone with a trans-Atlantic connection is also welcome. I'd also like to see output from those of you that have been using proxies or VPNs based out of different regions and are having success re-routing around some of these troubled hops/networks. Thanks again, Justin
  22. Justin

    New Content

    No problem. Sorry it took a while to sort out.
  23. OK, I think I figured it out based on the message you sent me. It was actually the Sphinx search engine that was failing. Please let me know if you feel like something is still broken regarding this issue.
  24. Justin

    New Content

    I'm looking into this issue at the moment, but I haven't been able to find anything out of the ordinary. Querying the database directly, the last_visit time for each user is updating properly and new posts are also being timestamped. They both use POSIX time and I believe those are the two important fields for determining new content. You mention a drop-down window to select a week of new content--how do I find that?
  25. Can anyone give a better idea of when it last worked properly? I know it was good at about 2:30pm - 3pm US central time yesterday. It seems I also used it later - about 5pm, but I can't swear to it. Appreciate the prompt response.
×
×
  • Create New...