Jump to content

Justin

Admin
  • Posts

    69
  • Joined

  • Last visited

Posts posted by Justin

  1. The problem is back again. :mad: :mad: :mad:

     

    Starting around 10:30pm PST, the Geocaching.com website experienced authentication issues that severed browser-based and token authorized apps (GSAK). Mobile apps continued to function during this time. Our engineers worked diligently to provide a hotfix and will continue to monitor closely. We apologize for the inconvenience.

  2. Since a couple of hours Geocachers in the Netherlands cannot access the Gc site through provider Kpn.

    Fortunately 4G mobile phones do not have the problem.

    Kpn service desk does not know how to solve it.

    Please help

     

    We've seen routing issues in the past, but I'm unable to determine if this is similar with the limited information provided. It's also possible that this will resolve on its own or related to maintenance somewhere along the route.

     

    First, I would verify that DNS is working properly. We've had issues in the past where the ISP provided DNS has issues resolving and switching them out for OpenDNS (208.67.222.222) or Google DNS (8.8.8.8) was a viable solution.

     

    If this is an issue of Kpn utilizing a sub-optimal route through one of our peers, it would be beneficial to provide traceroute data from both the 4G and Kpn connection. If you're not familiar with traceroute, you can visit http://tracer01.Groundspeak.com/ and it will automatically hit our endpoint and log your traceroute as well. Feel free to post the output in this thread for others, but obfuscate the last couple octets of your IP if you don't want to share that publicly.

  3. If you submit a request through our Help Center, we might be able to help you figure it out. You'll need to include your full IP address in the request.

     

    Thank you. I submitted a copy of the message through Help Center, along with the full IP. I categorized it under Bug Reports. Appreciate the help!

     

    -Jason

     

    Unfortunately, we're only able to manipulate the outbound route from our infrastructure. If you're having issues reaching us, there might not be anything that we can do. We have a clear route outbound to your ISP, currently using our BGP peer NTT. I can't directly reach your modem, but that's likely intentional and ICMP is disabled.

     

    If you want to run MTR (or WinMTR) to www.geocaching.com (63.251.163.200), I can review it for you if you reply to the Help Center thread or want to post it here. Feel free to obfuscate the first couple hops.

     

    Have you tried to use alternative DNS? Sometimes we see issues like this and switching from the default ISP provided DNS to Google (8.8.8.8) or OpenDNS (208.67.222.222) will do the trick.

  4. Services should now be back online. I plan to visit the pet store later today for a longer term solution.

     

    I reported it to a higher level email address. Thank you and please continue to report issues on this thread. I've pinned it to this forum. If some of you know my phone number, please send me a text message any time and I'll send the email to Groundspeak via that channel. I do so hope they begin recycling the web server's process so this issue would crop up less frequently.

     

    I also included a joke. You remember those Groundspeak videos about how hamsters power the geocaching.com web site via hamster wheels? Perhaps the Wherigo server needs some longer-lasting gerbils. They live twice as long as hamsters. (Hmm... sounds like I might make this into an obscure running gag.) (For those of you wondering, that's the only thing I know about the differences between the two--and for even that, I had to do a web search. Feel free to make up your own hamster/gerbil joke the next time the Wherigo server has issues sending you a cartridge.)

  5. [...] What is 3200m?

     

    An elevation given in meters. :rolleyes:

     

    Hans

    That was the obvious answer, but "on 3200m" made me question if it was some cellular provider that I wasn't familiar with. Hence, why I asked for clarification.

  6. I experienced problems with photos

     

    gc.com slowness shouldn't affect this since the photos are now hosted by a third party, no?

    Yes, that cloudfront URI is the caching service we use in front of the Amazon S3 buckets that host the images. His ISP has been having issues with GTT interchange bandwidth, so it's possible access to those endpoints is affected also.

  7. Does not work, still endless loading pics. this WE i was on 3200m and was using my phone to load a pic in cgeo and this was working.

    That's a different issue than what the others in the thread have experienced. Does this occur at all times during the day or is it isolated to a certain window? What is 3200m?

  8. i also have Problems since some weeks with GC.com website in switzerland. i cannot see the pictures anymore from a cache description(http://www.geocaching.com/geocache/GC5W72Y_back-to-school), just getting an error. how should i solve a riddle like that?! this is with all caches with downloadable content.

     

    -----------------------

     

    Version 44.0.2403.125 m

    Google Chrome ist auf dem neuesten Stand.

     

    ---------------------

     

    Diese Webseite ist nicht verfügbar.

     

    ERR_CONNECTION_TIMED_OUT

    Neu ladenDetails ausblenden

    Google Chrome konnte die Webseite nicht laden, weil d1u1p2xjjiahg3.cloudfront.net zu lange zum Antworten benötigt. Möglicherweise ist die Website inaktiv oder es gibt Probleme mit Ihrer Internetverbindung.

     

    ----------------------

     

    PS C:\> tracert www.geocaching.com

     

    Routenverfolgung zu www.geocaching.com [63.251.163.200] über maximal 30 Abschnitte:

     

    1 <1 ms <1 ms <1 ms router[xxx]

    2 5 ms 9 ms 9 ms 217-162-204-1.dynamic.hispeed.ch [xxx]

    3 8 ms 8 ms 7 ms xxx.static.cablecom.ch [217.168.54.125]

    4 129 ms 130 ms 126 ms 84.116.200.237

    5 130 ms 128 ms 127 ms 84.116.138.141

    6 29 ms 30 ms 31 ms 84-116-130-57.aorta.net [84.116.130.57]

    7 129 ms 129 ms 129 ms us-was03a-rd1-xe-0-3-0.aorta.net [84.116.130.66]

    8 128 ms 129 ms 127 ms nl-ams04a-ri2-xe-9-2-0.aorta.net [84.116.130.174]

    9 201 ms 129 ms 128 ms dcp-brdr-03.inet.qwest.net [63.235.40.105]

    10 239 ms 228 ms 227 ms sea-edge-13.inet.qwest.net [67.14.41.66]

    11 206 ms 203 ms 205 ms 63-235-80-54.dia.static.qwest.net [63.235.80.54]

    12 193 ms 193 ms 193 ms border8.po2-40g-bbnet2.sef.pnap.net [63.251.160.82]

    13 192 ms 192 ms 193 ms www.geocaching.com [63.251.163.200]

     

    Ablaufverfolgung beendet.

    PS C:\>

     

    ---------------------

     

    and the website goes very slow after 20h local time, take 2min to load the page.

     

    please fix that!

    I've submitted your CIDR (217.162.128.0/17) for the work around to avoid GTT on the route from our infrastructure back to your client. Hopefully this will be in place in time to test later today.

  9. On LinkedIn broken code has got out in the wild preventing job description text from rendering on the UX Designer position posted.

     

    On the geocaching.com/jobs page, the current openings section is empty. I was expecting a list of open positions like the UX Designer role posted on LinkedIn as it sounded very interesting to me. If all positions are filled or closed it would be a better user-experience if that was messaged below the title.

     

    repro steps:

     

    1. From LinkedIn, go to the UX Designer position: https://www.linkedin.com/jobs2/view/66361769

     

    2. Below the "listing info" heading this line of code appears:

     

    function copyToClipboard(text) { window.prompt("Copy to clipboard: Ctrl+C, Enter", text); }

    This has been noted, but it's only visible when browsing the site via SSL. The script provided by our job board software is called with a non-SSL HTTP:// source reference and the browser is hiding non-SSL content. If the job board posting script is compatible, we'll get it updated shortly.

  10. Someone kicked the server. It's business as usual now.

    I rebooted it right around noon PDT. I don't have much experience with supporting Wherigo, so I'm not sure that we'll get far assessing a root cause. Prior to this, it appeared that server had been operating without issue since at least late March.

  11. The Google Analytics code has been present for several years.
    While not usually a problem on this site, there ARE times when the "Waiting for..." in the lower left corner indicates that it is not geocaching.com that is holding up the show, but rather, a 3rd party site is creating the issue..

     

    Part of the way that you can help in avoiding this is organizing web pages such that all of your own primary content will be loaded before any 3rd party content that may, for whatever reason, have severe latency issues. There have been times where even the google analytics site has been so slow that I've relegated it to 127.0.0.1 in my hosts file just to get pages to load in a timely fashion. Since the analytics redesign, google isn't nearly the sort of problem it once was, but it can still cause issues, and is only one of many sites that web designers seem to have to 'preload' before their content is completed.

    I'm not a web developer, but there might be some pagespeed prioritization tasks beneficial to the site. Waterfalls show Google Analytics content loading fairly late in the process from my observation, but perhaps it can be deferred until all content is loaded. Thanks for bringing this up, I'll share it with the appropriate team.

  12. The issue is not related to server or infrastructure load based on the metrics we capture. It is a tier-1 ISP routing issue. This can be observed when users with the issue modify their route using VPN services or utilities like ZenMate, and the site becomes snappy and responsive.

     

    When the problem first manifested, it was primarily among Deustche Telekom users. We had asked users to submit their ping and traceroute data in hopes of isolating the network or hops responsible, and those client to Groundspeak traces seemed to indicate that NTT's network could be the issue. It wasn't until we were able to setup a test client on a Deutsche Telekom DSL network and run bi-directional traces on our own that we started seeing a pattern that eventually caught the attention of our provider.

     

    Our provider, Internap, uses a blended BGP solution for our internet access which consists of 7 tier-1 peers (ATT, GTT, NTT, XO, Zayo, Cogent and Qwest). A technology they employ called MIRO provides dynamic optimization of all outbound connectivity and constantly evaluates the best peer for a route. To test each peer individually, we asked Internap's network engineer to effectively disable MIRO for the IP range of the test client setup on DT, and then specify one peer network at a time. After establishing a baseline of normal performance on all 7 peers, we continued troubleshooting during a recent 18:00-20:00 CET poor performance window. These tests ultimately yielded evidence that connections routed across ATT or GTT were more prone to performance problems. So for Deutsche Telekom, that led to a workaround on our side to never use ATT and GTT for those client networks, and that is hopefully showing improvement for the vast majority of DT users. Specific IP details are available here.

     

    Within the last few days, we've seen a pattern of new ISPs and locations reported and you are obviously included in that. The list we have compiled includes Cablecom (Switzerland), UPC (Ireland) and Ziggo (Netherlands). However, after doing more research, it appears that Cablecom and Ziggo have affiliation with UPC, so it appears this new round of reports have a common link. Since we don't have the luxury of a test box on one of these networks, it might take a little more time to reach a solution but this issue has been made a priority. Considering the similarities with the DT issue, ATT and GTT could very well be introducing the problems that you're having and excluding those peers for your network range might resolve the problem. We will have to identify the possible CIDR network ranges in use by your provider to apply the workaround, as well as verify the peer in use when the problem is observed.

     

    The last few weeks the website of geocaching.com is very slow or not loading at all in the evening time (e.g. 20:00h Amsterdam time +1 GMT)

     

    Can anybody tell what is the problem???

     

    Are to little servers available? Or are just the cachers in the US of A waking-up?

     

    It's very anoying not being able to navigate the website properly and log or search caches.

     

    Anybody any idea's?

     

    I see these routing issues from KabelBW in Germany too - basically the whole of the Geocaching web service, including API calls from GSAC, is unusable after around 8pm CET. It's meaning I have to make sure I have everything done in the morning that I need for evening caching trips, and is also forcing me to log in batches when I have a morning free :-(

     

    I can request a shunt for the 109.192.0.0/15 network, but can you please provide me with the output of http://tracer01.Groundspeak.com during a slow period? Feel free to remove your specific IP.

  13. I've requested a shunt to avoid ATT, Cogent and GTT on your CIDR. This should be active now, so please let me know if your experience has improved on your next attempt at browsing the site during a previous slow period.

     

    178.82.0.0/16 will now only use NTT, Qwest, XO or Zayo. You can view your outbound route from our infrastructure by visiting http://tracer01.Groundspeak.com

     

    I'm preparing a post so we can start collecting more evidence of the slowdown and identify which peer(s) is responsible for those of you affiliated with UPC.

     

    Hi Justin

     

    I've got all IP-Blocks of the Swiss Cablecom ISP here - a total of 63 (!) ranges registered at RIPE with netname CABLECOMMAIN-NET.

    Since I don't want to blow up this thread, shall I send this to you by mail?

     

    Greetings

    Ralf

     

    Wow... it's 17:53 and everything's quite fast - didn't have this for weeks. :rolleyes: Will keep monitoring...

    Good stuff!

     

    I suspect the issue is related to the work being done by GTT represented in this article: http://blog.streamingmedia.com/2015/06/isps-not-causing-network-slowdowns.html

     

    Our ISP's peering technology is failing to dynamically prioritize the optimal route, so in this instance we are manually avoiding ATT, Cogent and GTT. It's quite cumbersome to get these shunts in place, but I'll continue to do so as necessary.

  14. Have noticed this issue for 1.5 years. My browser tells me that it a 3rd party site that cannot deliver content. AKA: cloudfront, googleanalytics.

     

    These sites are heavily used will degrade the GC site.

     

    On a side note, Images from the gallery will not display sometimes. Don't know where they are hosted now (probably cloudfront)

    I don't believe your issue is related to what the others in this thread are experiencing.

     

    Images are hosted via Amazon S3 and backed by the Cloudfront CDN service to cache them at different edge locations. The Google Analytics code has been present for several years.

     

    I would suggest that you try using the latest Chrome and Firefox browsers to see if the problem persists. If it does, I would modify your DNS settings from the default that your ISP provides to either 8.8.8.8 (Google Public DNS) or 206.67.222.222 (OpenDNS). Give that a try and if you still continue to have problems, submit a ticket to the CM team and feel free to reference that I helped you in the forums. http://support.Groundspeak.com/index.php?pg=request

  15. We've independently come to the same conclusion as you guys, that a node halfway along the Tracert 'list' is to blame. I have been using Pingplotter to traceroute over time, and get an idea of the packetloss going on. The packetloss from the server at fault is usually between 20 and 50%, but in the evenings goes right up to 100%, and the site doesn't work.

    Packet loss at a single hop in the trace with no loss after it is not indicative of a fault.

     

    The router is doing its job, passing packets back and forth. Responding to traceroutes is right at the bottom of its priority list. If it has better things to do (like, say, routing packets) then your traceroute gets ignored.

     

    The fact that there is zero loss after that hop confirms that there is not a problem there at all. The packets are getting through just fine. If the loss started at one hop and continued in multiple subsequent hops, then it would indicate a problem.

    EngPhil is correct. What you're seeing here is ICMP deprioritization and it has caused a lot of confusion for users trying to troubleshoot the issue from their end. When "packet loss" is present at one hop and doesn't persist from source to destination, it means that that router had more important obligations and was saving its resources for its primary networking functions.

     

    We need to gather data from users during the slowdowns so we can identify which of our 7 available BGP peers is responsible for this problem. You can participate by providing traceroute data from our network back to your client by visiting http://tracer01.Groundspeak.com

     

    In the case of Deutsche Telekom users, we were able to identify poor experiences using ATT or GTT. I suspect we might find a similar pattern with users on your ISP, but we need to collect more samples to be sure. There are some recent publications that could possibly address what we've seeing:

    http://blog.streamingmedia.com/2015/06/isps-not-causing-network-slowdowns.html

    http://money.cnn.com/2015/06/25/technology/slow-internet/

  16. hmm i think last post might have got caught as spam. lets try again.

     

    Is there any way to get help with slownesses i am experience. It's quite frustraing that I can not use the site from 1700hrs UTC til the next day.

     

    3.|-- 109.255.254.29 0.0% 20 12.4 12.1 10.0 18.3 1.8

    4.|-- 84.116.238.70 0.0% 20 24.6 24.3 19.7 30.7 2.8

    5.|-- 84.116.137.74 0.0% 20 20.3 21.6 16.6 39.5 4.6

    6.|-- 84.116.133.18 0.0% 20 20.9 24.1 20.9 32.7 2.7

    7.|-- 195.66.236.138 0.0% 20 20.8 22.9 18.3 39.9 5.4

    8.|-- 129.250.4.85 0.0% 20 21.1 24.6 20.5 46.3 5.4

    9.|-- 129.250.3.126 0.0% 20 125.2 122.0 112.9 146.4 9.0

    10.|-- 129.250.4.13 20.0% 20 197.8 202.2 195.5 214.4 6.4

    11.|-- 129.250.5.45 0.0% 20 198.6 199.2 194.6 207.1 2.6

    12.|-- 129.250.201.18 0.0% 20 178.2 164.4 155.8 178.2 5.2

    13.|-- 63.251.160.82 5.0% 20 163.1 182.0 160.0 317.7 37.8

    14.|-- 63.251.163.200 5.0% 20 154.7 162.3 153.1 173.1 5.6

     

    ISP using at least this netblock: 176.61.64.0 - 176.61.95.255

    Tracing from your client to our infrastructure does not appear to be fruitful in this situation. Please run a trace from our network to your client IP by visiting http://tracer01.Groundspeak.com during a slow period. You can either reply with the output in this thread or if you prefer to not share your IP, submit it to the Customer Management team via http://support.Groundspeak.com/index.php?pg=request with ATTN: Justin.

     

    Once I've gathered more data, I'll request shunts for your network/ISP and you can determine if the situation has improved.

  17. Hi Justin

     

    I've got all IP-Blocks of the Swiss Cablecom ISP here - a total of 63 (!) ranges registered at RIPE with netname CABLECOMMAIN-NET.

    Since I don't want to blow up this thread, shall I send this to you by mail?

     

    Greetings

    Ralf

    I've requested a shunt to avoid ATT, Cogent and GTT on your CIDR. This should be active now, so please let me know if your experience has improved on your next attempt at browsing the site during a previous slow period.

     

    178.82.0.0/16 will now only use NTT, Qwest, XO or Zayo. You can view your outbound route from our infrastructure by visiting http://tracer01.Groundspeak.com

     

    I'm preparing a post so we can start collecting more evidence of the slowdown and identify which peer(s) is responsible for those of you affiliated with UPC.

  18. The last few weeks the website of geocaching.com is very slow or not loading at all in the evening time (e.g. 20:00h Amsterdam time +1 GMT)

     

    Can anybody tell what is the problem???

     

    Are to little servers available? Or are just the cachers in the US of A waking-up?

     

    It's very anoying not being able to navigate the website properly and log or search caches.

     

    Anybody any idea's?

     

    The issue is not related to server or infrastructure load based on the metrics we capture. It is a tier-1 ISP routing issue. This can be observed when users with the issue modify their route using VPN services or utilities like ZenMate, and the site becomes snappy and responsive.

     

    When the problem first manifested, it was primarily among Deustche Telekom users. We had asked users to submit their ping and traceroute data in hopes of isolating the network or hops responsible, and those client to Groundspeak traces seemed to indicate that NTT's network could be the issue. It wasn't until we were able to setup a test client on a Deutsche Telekom DSL network and run bi-directional traces on our own that we started seeing a pattern that eventually caught the attention of our provider.

     

    Our provider, Internap, uses a blended BGP solution for our internet access which consists of 7 tier-1 peers (ATT, GTT, NTT, XO, Zayo, Cogent and Qwest). A technology they employ called MIRO provides dynamic optimization of all outbound connectivity and constantly evaluates the best peer for a route. To test each peer individually, we asked Internap's network engineer to effectively disable MIRO for the IP range of the test client setup on DT, and then specify one peer network at a time. After establishing a baseline of normal performance on all 7 peers, we continued troubleshooting during a recent 18:00-20:00 CET poor performance window. These tests ultimately yielded evidence that connections routed across ATT or GTT were more prone to performance problems. So for Deutsche Telekom, that led to a workaround on our side to never use ATT and GTT for those client networks, and that is hopefully showing improvement for the vast majority of DT users. Specific IP details are available here.

     

    Within the last few days, we've seen a pattern of new ISPs and locations reported and you are obviously included in that. The list we have compiled includes Cablecom (Switzerland), UPC (Ireland) and Ziggo (Netherlands). However, after doing more research, it appears that Cablecom and Ziggo have affiliation with UPC, so it appears this new round of reports have a common link. Since we don't have the luxury of a test box on one of these networks, it might take a little more time to reach a solution but this issue has been made a priority. Considering the similarities with the DT issue, ATT and GTT could very well be introducing the problems that you're having and excluding those peers for your network range might resolve the problem. We will have to identify the possible CIDR network ranges in use by your provider to apply the workaround, as well as verify the peer in use when the problem is observed.

  19. What I have seen is that the connection is extremely slow (unusable) as soon as "ntt.net" (Frankfurt->NYC->Seattle) appears in the TRACERT list. This always happens to me when connecting from my home PC (Cablecom Switzerland).

     

    ..snip..
    ae-10.r03.frnkge03.de.bb.gin.ntt.net
    ae-1.r21.frnkge03.de.bb.gin.ntt.net
    ae-3.r23.nycmny01.us.bb.gin.ntt.net
    Timeout
    Timeout
    ae-2.r04.sttlwa01.us.bb.gin.ntt.net
    ae-0.internap.sttlwa01.us.bb.gin.ntt
    border8.po1-40g-bbnet1.sef.pnap.net
    www.geocaching.com
    

     

    However, when using geocaching.com from my office (at the very same time via VPN) everything is quite fast and my route is using zayo.com for crossing the atlantic ocean - ending up at a different gateway of pnap.net.

     

    ..snip..
    xe-1-2-0.mpr1.fra4.de.above.net
    ae8.mpr1.fra3.de.zip.zayo.com
    ae4.cr1.ams5.nl.zip.zayo.com
    ae0.cr1.ams10.nl.zip.zayo.com
    v142.ae29.cr2.ord2.us.zip.zayo.com
    ae11.cr1.ord2.us.zip.zayo.com
    v11.ae29.mpr1.sea1.us.zip.zayo
    208.185.125.106.ipyx-072053-008
    border8.po2-40g-bbnet2.sef.pnap
    www.geocaching.com
    

     

    Seems that the guys at pnap.net should talk to their peering partner ntt.net...

    From what we've observed recently, the routes from our infrastructure to the client are actually more valuable for troubleshooting this issue. Outbound connections over NTT are typically quite good, and with regard to the Deutsche Telekom users previously having issues in this thread--we were able to determine that connections over GTT and ATT were problematic.

     

    A workaround has been put in place to avoid those peers for the DT network ranges we've observed complaints on, but it has to be applied via CIDR notation for each network specified. Right now we've applied the workaround for 7 of the largest CIDRs announced by DT, which covers ~25 million of the 34.3 million addresses announced via RIPEstat.

     

    Those ranges currently avoiding ATT and GTT are:

    79.192.0.0/10 (79.192.0.0 - 79.255.255.255)

    84.128.0.0/10 (84.128.0.0 - 84.191.255.255)

    87.128.0.0/10 (87.128.0.0 - 87.191.255.255)

    91.0.0.0/10 (91.0.0.0 - 91.63.255.255)

    93.192.0.0/10 (93.192.0.0 - 93.255.255.255)

    217.0.0.0/13 (217.0.0.0 - 217.7.255.255)

    217.224.0.0/11 (217.224.0.0 - 217.255.255.255)

    217.80.0.0/12 (217.80.0.0 - 218.95.255.255)

     

    Are you observing this slowness around the window of 18:00-20:00 CET or can you give an estimate? Do you know if your cablecom.ch IP is static or typically operates in the same CIDR? RIPEstats for your ISP do not appear complete, so I would have a hard time isolating possible networks for a workaround based on that information.

  20. Thanks for the response.

     

    As a customer of theirs, would you be willing to open a ticket with Deutsche Telekom and see if they can help you work the issue on their end? I've had little success reaching out to ISPs that I am not a customer of.

     

    Also, Rhapsody hosts their music catalog in our datacenter so it should traverse the same routing path. I haven't seen anyone respond to my request to browse their catalog and report if it experiences similar slowness. That site is http://origin.rhapsody.com/browse/

     

    It's clear that ZenMap and other VPN solutions provide a workaround, but we would like mitigate that requirement.

     

    I am also located in Germany, connected to the Internet with 1&1 as provider running over Deutsche Telekom VDSL (physical link). Using www.geocaching.com often is extremly slow. I did not test yet, if this is in any way related to the fact that my connection to the Internet is using dual-stack (IPv4 and IPv6).

     

    BUT I have strong evidence that the "slowness" is somewhat related to Deutsche Telekom. The reason: I am able to switch to a totally different provider by using a VPN connection to the DFN ("Deutsches Forschungsnetz"). As soon as I am accesing www.geocaching.com via DFN through the VPN tunnel the web site is fast and everything "works as expected". When tearing down the VPN connection, www.geocaching.com is very slow (often nearly unusable) again.

     

    If someone of the IT department of Groundspeak is interrested, please contact me for traceroutes...

     

    Mayby this helps to catch the problem...

    Regards,

    LaPalmaFan

  21. Thank you all for providing the situational data in this thread. I've compiled what's been submitted here, Facebook, Twitter and conversations I've had with few of you directly so I can effectively communicate the situation with our provider. I appreciate all of your time and effort, and am terribly sorry to hear that your experience accessing our resources has been degraded in recent weeks.

     

    Painting a picture from all of the information provided, I believe ecanderson and cezanne are on the right track with there being a tier 1 ISP routing issue affecting folks in Germany and possibly other parts of central Europe. This is based on posted traceroute (tracert) output of users experiencing dramatically increased latency on middle hops and even packet loss.

     

    In all the examples that I've seen so far, the dramatic latency increases are occurring after your provider hands the packet off to a tier 1 ISP, but before it reaches our provider. So neither of our providers' networks appear directly responsible, but some of the internet backbone peers that both of our providers rely on could be. Of the slow instances that I've reviewed, the traceroute output frequently shows routes that include tier 1 network providers Cogent and NTT. I found a relevant article on backbone/peering concerning Netflix that shows Cogent and NTT with higher latency than other providers and talks about the effect of slowing during peak times. Cogent also appears to have engaged in Quality of Service traffic deprioritization that might be worth noting.

     

    Likely the best and only course of action is to draw attention to the routing issue and hope it gets resolved soon. I've initiated dialog with the network operations center of our provider, Internap, which you'll see in the last couple of hops of the traceroute output that folks have posted. I know that we utilize several BGP peers, which appear to include Cogent and NTT based on network hop naming. While many ISPs have agreements with these providers as BGP peers, it doesn't indicate if that agreement is direct or third-party, but they might have some influence in getting this resolved. I have asked for guidance on this situation and what options are available to us.

     

    With all that said, I would still like to continue collecting more positive and negative experience data from end users in this thread. For those of you including time without specified time zones I'm assuming those are UTC. Since some of you have reported slow load times on Geocaching as well as the Discussion Forums, I would anticipate that load times across all resources at our Seattle Internap facility would yield similar results. It would be helpful to know if the performance is the same on both sites while logged in or browsing as an unathenticated guest. For those that mentioned accessing and tracing seattle.gov when Geocaching was slow to respond, might I suggest tracing origin.rhapsody.com and navigating through their music catalog. Rhapsody is hosted in our space and should have a route much closer to our own.

     

    Traceroute output from positive experiences like UK user MartyBartfast reported, or anyone with a trans-Atlantic connection is also welcome. I'd also like to see output from those of you that have been using proxies or VPNs based out of different regions and are having success re-routing around some of these troubled hops/networks.

     

    Thanks again,

    Justin

  22. Also broken: choosing "Today's active content" which in the past would show all posts made in the last 24 hours, returns 0 posts.

     

    OK, I think I figured it out based on the message you sent me. It was actually the Sphinx search engine that was failing. Please let me know if you feel like something is still broken regarding this issue.

  23. Normally if I check the forums every morning it will give me 4 or 5 new pages. The past two days it tells me no new content. If I go to the drop-down window and select 24 hours it tells me no new content, but if I choose week it will show me about 14 pages. Has anyone else ran into this?

     

    I'm looking into this issue at the moment, but I haven't been able to find anything out of the ordinary. Querying the database directly, the last_visit time for each user is updating properly and new posts are also being timestamped. They both use POSIX time and I believe those are the two important fields for determining new content.

     

    You mention a drop-down window to select a week of new content--how do I find that?

  24. I've notified our IT team of this issue.

     

    Can anyone give a better idea of when it last worked properly?

     

    I know it was good at about 2:30pm - 3pm US central time yesterday. It seems I also used it later - about 5pm, but I can't swear to it.

     

    Appreciate the prompt response.

×
×
  • Create New...