Jump to content
  • Member Statistics

    17,608
    Total Members
    7,904
    Most Online
    NH8550
    Newest Member
    NH8550
    Joined

NCEP SERVER REALLY SLOW DURING GFS 12z runs


Ji

Recommended Posts

anyone else experience this. it takes forever to load a page. What the hell did those guys do? It worked great last year

You sure can be a turd sometimes. I'm guessing this is related to machine work that has been going on lately (node installations, etc.). There is/was a parallel production test running today on the backup machine, and it's possible that this caused some bandwidth issues (if things were inadvertently being sent from both the production and backup machines). I don't work for NCO, so this is just speculation on my part....

Link to comment
Share on other sites

You sure can be a turd sometimes. I'm guessing this is related to machine work that has been going on lately (node installations, etc.). There is/was a parallel production test running today on the backup machine, and it's possible that this caused some bandwidth issues (if things were inadvertently being sent from both the production and backup machines). I don't work for NCO, so this is just speculation on my part....

its been like this all winter

Link to comment
Share on other sites

its been like this all winter

Hmmm... in that case I'm not sure. What page(s) are you referring to specifically (the maps from the MAG)? If there are legitimate issues, there are proper channels to get complaints in so that the people responsible are made aware.

Link to comment
Share on other sites

Here's the response that I got....I guess I'll be sending a more detailed response later.

Do you have any specific times and dates when the slowdowns occurred?

It would help us tremendously to have specific target times and dates when looking through the log files for a cause.

Also, is this always on the same computer system? What type and speed connection to the Internet does it have?

Is it possible that this is a connectivity issue (the Internet connection between your computer and our server is saturated)?

Does this slowness exist for any other (non-cached) websites?

How long does it typically last?

Please pardon all the questions, but yours is our first report of a slowdown and I want to be sure we have enough information to track the problem.

Thanks,

---

Bradley Mabe

Systems Integration Branch

I have found it to be frustratingly slow for the last few weeks. You can send an email to the webmaster for ncep at http://www.ncep.noaa...mail_webmaster/ I'm going to send an email now. Sometimes it takes a full minute just to switch between map categories.

Link to comment
Share on other sites

Here's the response that I got....I guess I'll be sending a more detailed response later.

Do you have any specific times and dates when the slowdowns occurred?

It would help us tremendously to have specific target times and dates when looking through the log files for a cause.

Also, is this always on the same computer system? What type and speed connection to the Internet does it have?

Is it possible that this is a connectivity issue (the Internet connection between your computer and our server is saturated)?

Does this slowness exist for any other (non-cached) websites?

How long does it typically last?

Please pardon all the questions, but yours is our first report of a slowdown and I want to be sure we have enough information to track the problem.

Thanks,

---

Bradley Mabe

Systems Integration Branch

You should make a point to reply soon, it's hard to help folks without detailed info...especially if it's a random slowness complaint as is yours at this point.

Link to comment
Share on other sites

You should make a point to reply soon, it's hard to help folks without detailed info...especially if it's a random slowness complaint as is yours at this point.

THIS.

This is also why I suggest to others: if you are experiencing issues or having problems, send a quick email. People can't fix a problem if they don't know about it.

Link to comment
Share on other sites

I agree. I responded as quick as I could with more details. Tomorrow I'll try to email as soon as I notice a problem. NCEP's communication with the public seems to be great!

You should make a point to reply soon, it's hard to help folks without detailed info...especially if it's a random slowness complaint as is yours at this point.

Link to comment
Share on other sites

i have had several issues as well. today i was getting nothing for awhile so i jumped over to raleighwx's site

things i have seen

slowness on numerous occasions, and time out screens

while looking at 850temp/mslp/precip maps sometimes when i hit the next buttom i get a a different map all together like 850T& height maps

my biggest issue is half the time i hit the back button and then hit a specific hour or another hour map, it randomly jumps from 640x480 to 1024x768

it was so much nicer when you could hit any hour of any parameter and get a direct map. now you have to preload every parameter to get those maps, which is just a pain when you run into slow loading.

i will email them thanks for the info dtk

Link to comment
Share on other sites

The people having problems... does it happen as the run is rolling out or almost at any time? My guess is on the former, as there's probably an upswing of users at those times...probably moreso with the 12z run, when people are at their offices and probably having lunch. In the past they tried using a reverse proxy (Nginx), but it looked that it didn't work out, as I don't see any hint of it in the response headers. That's a shame, as I think that would bring the performance issues down by a lot.

Issues like showing mixed up maps remind me of the Plymouth site, as it happens a lot there, that's probably due to non-thread safe code, where you get data another user probably requested...happens a lot with Java if you are not careful, and only on concurrent requests, which increase in a busy environment.

Another thing, it's obvious design and coding is done by the same people, that's usually a no no, and MAG is a prime example of why. I'm already used to how it works, but the site it's not just not pretty (which I could not care less), but it's functionally inefficient... you shouldn't have that many page loads/clicks to get a map...options are always the same, even, why not load every option in a page and use combo boxes or something like that?

Not bashing the guys in charge, I haven't had the issues you mention, or have them infrequently, but if that many people are complaining, there's almost always something in there that should be checked up.

Finally, to discard any internet issues between you and the MAG server, do a traceroute when it's responding fast and save those results. When it's slow, run several traceroutes (3-4) for comparision purposes. If there are significant differences (packets timing out or response way higher), that would hint us on internet issues. The MAG server does not accept ICMP packets, I see, but at least you can discard issues up to the last responding node.

Ex.

tracert mag.ncep.noaa.gov

traceroute to mag-itc.woc.noaa.gov (140.90.200.71), 64 hops max, 40 byte packets
1  10.34.109.1 (10.34.109.1)  9.444 ms  8.858 ms  7.803 ms
2  10.1.90.3 (10.1.90.3)  10.639 ms  12.484 ms  12.329 ms
3  mmredes-207-248-54-93.multimedios.net (207.248.54.93)  13.381 ms  12.107 ms  12.136 ms
4  mmredes-207-248-54-69.multimedios.net (207.248.54.69)  11.029 ms  10.306 ms  10.008 ms
5  te3-4.ccr01.mfe01.atlas.cogentco.com (38.104.176.49)  13.619 ms  14.740 ms  14.046 ms
6  te3-4.ccr01.lrd01.atlas.cogentco.com (154.54.27.189)  17.341 ms te2-1.ccr01.lrd01.atlas.cogentco.com (154.54.80.213)  17.314 ms te7-2.ccr01.sat01.atlas.cogentco.com (154.54.29.225)  19.741 ms
7  te0-0-0-7.mpd21.iah01.atlas.cogentco.com (154.54.80.158)  29.259 ms te0-1-0-5.ccr21.iah01.atlas.cogentco.com (154.54.80.150)  26.565 ms te0-0-0-7.mpd21.iah01.atlas.cogentco.com (154.54.80.158)  28.308 ms
8  te0-3-0-5.ccr21.dfw01.atlas.cogentco.com (154.54.5.206)  30.531 ms te0-3-0-2.ccr21.dfw01.atlas.cogentco.com (154.54.2.206)  31.098 ms te0-2-0-6.ccr21.dfw01.atlas.cogentco.com (154.54.1.73)  31.787 ms
9  te2-1.mpd01.dfw03.atlas.cogentco.com (154.54.7.46)  32.999 ms te8-3.mpd01.dfw03.atlas.cogentco.com (66.28.4.174)  31.382 ms te3-3.mpd01.dfw03.atlas.cogentco.com (154.54.6.94)  32.276 ms
10  qwest.dfw03.atlas.cogentco.com (154.54.11.166)  60.137 ms  31.418 ms  31.943 ms
11  dca-edge-21.inet.qwest.net (67.14.6.66)  66.557 ms  65.272 ms  85.258 ms
12  65.123.192.198 (65.123.192.198)  65.729 ms  68.103 ms  65.938 ms
13  140.90.111.34 (140.90.111.34)  64.498 ms  64.897 ms  65.901 ms
14  140.90.75.6 (140.90.75.6)  71.527 ms  66.725 ms  65.608 ms
15  140.90.76.178 (140.90.76.178)  76.723 ms  66.343 ms  66.470 ms
16  * * *

That's a traceroute for a responsive site...packets are dropped in hop 16, but that's because of firewall issues probably, not internet connectivity problems. If you are getting a slow response, and traceroutes show that, for example, hop 3 is unresponsive or if packet latency skyrockets (say 500 ms or up) then the problem lies on the internet, not the site. That data will be plenty useful for the guys in charge of the server, as they can't do these kind of diagnostics from their end.

One last thing, the command for Windows is tracert, but for most other Unix based systems (including Mac) it's traceroute.

Link to comment
Share on other sites

The people having problems... does it happen as the run is rolling out or almost at any time? My guess is on the former, as there's probably an upswing of users at those times...probably moreso with the 12z run, when people are at their offices and probably having lunch. In the past they tried using a reverse proxy (Nginx), but it looked that it didn't work out, as I don't see any hint of it in the response headers. That's a shame, as I think that would bring the performance issues down by a lot.

Issues like showing mixed up maps remind me of the Plymouth site, as it happens a lot there, that's probably due to non-thread safe code, where you get data another user probably requested...happens a lot with Java if you are not careful, and only on concurrent requests, which increase in a busy environment.

Another thing, it's obvious design and coding is done by the same people, that's usually a no no, and MAG is a prime example of why. I'm already used to how it works, but the site it's not just not pretty (which I could not care less), but it's functionally inefficient... you shouldn't have that many page loads/clicks to get a map...options are always the same, even, why not load every option in a page and use combo boxes or something like that?

Not bashing the guys in charge, I haven't had the issues you mention, or have them infrequently, but if that many people are complaining, there's almost always something in there that should be checked up.

Finally, to discard any internet issues between you and the MAG server, do a traceroute when it's responding fast and save those results. When it's slow, run several traceroutes (3-4) for comparision purposes. If there are significant differences (packets timing out or response way higher), that would hint us on internet issues. The MAG server does not accept ICMP packets, I see, but at least you can discard issues up to the last responding node.

Ex.

tracert mag.ncep.noaa.gov

traceroute to mag-itc.woc.noaa.gov (140.90.200.71), 64 hops max, 40 byte packets
1  10.34.109.1 (10.34.109.1)  9.444 ms  8.858 ms  7.803 ms
2  10.1.90.3 (10.1.90.3)  10.639 ms  12.484 ms  12.329 ms
3  mmredes-207-248-54-93.multimedios.net (207.248.54.93)  13.381 ms  12.107 ms  12.136 ms
4  mmredes-207-248-54-69.multimedios.net (207.248.54.69)  11.029 ms  10.306 ms  10.008 ms
5  te3-4.ccr01.mfe01.atlas.cogentco.com (38.104.176.49)  13.619 ms  14.740 ms  14.046 ms
6  te3-4.ccr01.lrd01.atlas.cogentco.com (154.54.27.189)  17.341 ms te2-1.ccr01.lrd01.atlas.cogentco.com (154.54.80.213)  17.314 ms te7-2.ccr01.sat01.atlas.cogentco.com (154.54.29.225)  19.741 ms
7  te0-0-0-7.mpd21.iah01.atlas.cogentco.com (154.54.80.158)  29.259 ms te0-1-0-5.ccr21.iah01.atlas.cogentco.com (154.54.80.150)  26.565 ms te0-0-0-7.mpd21.iah01.atlas.cogentco.com (154.54.80.158)  28.308 ms
8  te0-3-0-5.ccr21.dfw01.atlas.cogentco.com (154.54.5.206)  30.531 ms te0-3-0-2.ccr21.dfw01.atlas.cogentco.com (154.54.2.206)  31.098 ms te0-2-0-6.ccr21.dfw01.atlas.cogentco.com (154.54.1.73)  31.787 ms
9  te2-1.mpd01.dfw03.atlas.cogentco.com (154.54.7.46)  32.999 ms te8-3.mpd01.dfw03.atlas.cogentco.com (66.28.4.174)  31.382 ms te3-3.mpd01.dfw03.atlas.cogentco.com (154.54.6.94)  32.276 ms
10  qwest.dfw03.atlas.cogentco.com (154.54.11.166)  60.137 ms  31.418 ms  31.943 ms
11  dca-edge-21.inet.qwest.net (67.14.6.66)  66.557 ms  65.272 ms  85.258 ms
12  65.123.192.198 (65.123.192.198)  65.729 ms  68.103 ms  65.938 ms
13  140.90.111.34 (140.90.111.34)  64.498 ms  64.897 ms  65.901 ms
14  140.90.75.6 (140.90.75.6)  71.527 ms  66.725 ms  65.608 ms
15  140.90.76.178 (140.90.76.178)  76.723 ms  66.343 ms  66.470 ms
16  * * *

That's a traceroute for a responsive site...packets are dropped in hop 16, but that's because of firewall issues probably, not internet connectivity problems. If you are getting a slow response, and traceroutes show that, for example, hop 3 is unresponsive or if packet latency skyrockets (say 500 ms or up) then the problem lies on the internet, not the site. That data will be plenty useful for the guys in charge of the server, as they can't do these kind of diagnostics from their end.

One last thing, the command for Windows is tracert, but for most other Unix based systems (including Mac) it's traceroute.

This is great info. Including a traceroute can't hurt, or at least having the info ready if asked for it. Most importantly, include the time (as exact as possible), what you were trying to load exactly, and what behavior you saw, including any error messages. Write [email protected] more complaints and more info they receive, the better then can remedy any issue...which it seems like there are several.

I personally liked the old design for its simplicity. The idea was to move to this century's technologies and to stop making every image on the supercomputer and make them "to order" on the server. Using GEMPAK this hasn't worked out as well as expected as it's a resource intensive software. But with useful info provided when users run into issues, I'm sure they can iron things out in the long run.

Link to comment
Share on other sites

it took me 2 minutes to get to a map that i wanted and I have an extremely fast connection here at work.

Try an app... Instant Weather Maps Pro and WeatherGeek Pro are both good... I'd give IWXM a solid 9/10 (only because he just has all the NCEP maps and none of the maps from his own site) and WeatherGeek Pro 5/10 (loads slower and doesn't have as many maps, but still works perfectly). Instant Weather Maps also has a Free version that only shows North American 00Z GFS maps and animations, if you are wondering whether to fork over $5 for the Pro version.

If you have an Android phone/tablet (including a rooted Kindle Fire... seriously, those things are screaming to be rooted) then WeatherGeek Pro is your only option for now.

Link to comment
Share on other sites

I've been emailing back and forth and got this hopeful response this morning. Hopefully the problem will go away :thumbsup:

Ken,

Thanks for all your help. Yesterday, we were able to capture the slowdown event as it happened during the uptake of the GFS and GEFS data into our system. Our service provider is currently working to identify and fix the issue.

I would very much appreciate a note if you notice any further slowdowns while using the MAG site.

You may continue to use the helpdesk system, or send directly to my e-mail at [email protected].

Again, thank you for all your help and information. We appreciate your interest in the site.

Sincerely,

Bradley

---

Bradley Mabe

Development Lead

Systems Integration Branch

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...