Thursday, January 8, 2009

How I shot myself in the foot. Twice. During yoga - so my foot was on my forehead....

If you were expecting to see a real story here - indeed there is. But it would be of no entertainment value if I did not have the joy of wearing both the "patient" and the "doctor" hats - as well as taking the roles of the "independent 3rd party" at times to observe.

Here goes the true story a winded "case", happened to yours truly.

Step0: I have to admit, the phone socket that I mostly use (for DSL only), is a totally oxidized piece of junk, which I am perpetually too lazy to replace because everything works fine if I half-unplug the DSL connector from the wall - and then it becomes a classic "if it ain't broken, don't...".

Step1: my DSL is dead. Dead for for pretty long time, but I'm not much bothered about fixing it, as the majority of the time I spend at this time period at home is sleeping, I can survive that without connectivity :-) Though first I did do a fair share of usual fiddling of the wall connector during the initial phases of troubleshooting. But the line is dead. Real dead(tm). So, especially since the initial failure happened to coincide with potential DSL headend upgrade at the ISP (as they were widely marketing the much higher speeds than before) - all looks as if it's a proverbial man who was patching the stuff downstairs, and cut the wrong wire ;)

Step2: after some administrative wiggling, which is uninteresting for the purposes of this story, the DSL gets finally fixed - it comes up on friday while I am at work. Then I verify during the weekend that "yes it works" and signal the happiness and thanks to the ISP support folks. Well, it kinda works at half speed, but again, for connectivity mostly during sleep that's enough, so I do not complain. :-) And then as well - the glass that is half full is much better than a totally empty one. :-)

[ here a week or so passes, meantime the reader is directed to click on the advertisements on /. as a pastime ]

Step3: friday - clean-up party at home. Things get moved, vacuum-cleaned, watered, polished, and a lot of other funny activities. To try to handle the trauma, I hide in the corner with connectivity. At some point the latter disappears. Trying to fiddle with the socket - no way.. Oh well, it's pretty late anyway, so why bother.. time to sleep.

Step4: saturday - some shopping time. No time to fiddle. But, as a reward for my good behaviour throughout the year, I get a shiny new Lenovo N500 laptop with Vista on it. Curiously I haven't touched Windows-based systems for anything beyond simple install for quite a while, so it's entertaining and will have its own post at some point. But the internet is still unstable. So some evening fiddling with socket + amplification settings in the config - so-so.

Step5: sunday - the moment of truth - I manage to get the full speed! And the Vista is really connected and flying! wow! Now its time to install firefox, cygwin, and all the other command-line toys that I will for sure need in order for this laptop to be of any use. I leave the download of some gigabytes on, and we go for a walk for a few hours. When we arrive back home, everything is downloaded, installed, I am a happy camper. There's this box that asks for a reboot, after some postponing, I reboot it, and leave to install the locally downloaded cygwin - and then go to watch some TV meanwhile and hack on some stuff in the background.

Step6: I notice the internet is not too good again. Argh. The most interesting thing - the pings to my default gateway over wireless work apparently fine, the download of the ubuntu image from the ISP-local mirror gives about 200kbytes/sec instead of 400+ that it should - which is anyway bearable, and can be attributed to the socket which maybe I *should* replace now. The cool part though that anything beyond the local ISP network is just plain broken. Horrible latencies of 1.5+ second (the ping to ISP's DNS server is 22milliseconds), and huge packet loss. Traceroute shows some oddities, so I make the conclusion that they did not like me for that cygwin update, and I ate all the quota of the month, after which they normally rate-limit the traffic to 64K. Well, maybe this is done for some reason only for external traffic.. odd but believable.

Step7: today early morning, I go to use my desktop, which is an oldie gentoo box, and notice that *everything flies* !!! Wow, must be that this socket has fixed itself again - lets check the Vista box... hmm - it does not work still! Can't be *so* precise timing, but let's check two simultaneously - indeed it's Vista specific.
Not being fully awake, only the search reflex is functioning. So I look up for "Vista internet slow". Which turns out with lots of forums with voodoo-like manipulations, uninstallations of antivirus software, recommendations to remove the SIMMs one by one, and suggestions to uninstall updates.

Step8: *updates*!!! That reboot was insisted upon by the update! I totally forgot about this given the flurry of other non-computer events that happened this holiday weekend. Ok, now at least we know "what changed" (something that I use as an introductory joke - "because I know nothing changed, but it's interesting to try to find out whether there's any occurence where asking this question explicitly will be of use" :) Now, we can also think "what's different between these two boxes". Hmm... linux-vista... nope, I *know* vista was working for sure, just fine :-) wired-wireless... good point, let's try wired... and it works fine! Using wireless... still dialup-like and worse performances. So, it's got to be bound to wireless... Let's check - maybe my good ol' access point went kaput ? No, the wife can use her wireless with no problem (though she has an oldie iBook which survived a surgical hard drive replacement by yours truly :)

Step9: The most plausible theory, that so far explains almost all the observations:

  • vista update and lenovo wireless drivers did not like each other
  • the packet loss which is the effect of this bug is somehow proportional to RTT (????) - which is a bold statement in itself, but at least explains why the throughput to ISP mirror was halved (small latency), and to anything on the internet was almost nil (bigger latency), and why I did not see anything with the default gateway testing (pretty much no latency)

Of course, the jury is still out to figure out what's the matter with wireless (and whether to install 38 more updates that the windows update is proposing... (Half of them are named "Windows Update #KB-number", and going manually over all of them via manual retyping the KB#s into the browser is a pretty boring way to spend time. oops, Vista will have its own post, I promised).

Anyway, after such a long preamble, a few observations-reminders both to myself, and to you, if you made it reading till here :-)

1) Life is too short to keep a note "what changed". That's why noone ever "admits" - not because of some magic stubbornness, unlike some BOFHs think, but just because there are better things to do.

2) One persistent issue will psychologically distract the attention even if there is really another one with a slightly different symptom - I clearly was blaming the latent L1 instability a bit too readily.

3) The combination of several smaller issues amplifies the effect and makes the resulting problem look quite more "interesting".

All three are something which I knew for quite a few years already, but they packed themselves so nicely into a "case study" - which, since it happened to myself, I could freely share.


There's more to the story.

No comments: