The cost of naive scheduling

The other day one of my coworkers expressed alarm at the condition of one of his servers: it was running at 100% CPU usage! Obviously, he said, something was wrong. I resisted the urge to say that what was wrong was that it was a Windows machine. He's a longtime Windows user, after all. And like a lot of Windows users, he thinks of a busy processor as a bad thing, a sign of a machine about to fail. On any other platform, it's a normal situation. Of course there's work to do!

But my coworker was right. Full processor usage is a bad sign on a Windows machine, because it performs remarkably badly under load. Where a loaded Unix machine simply feels slower, a loaded Windows machine is unpredictably unresponsive. Why?

I understand the Windows scheduler is pretty simpleminded - it simply runs the highest-priority available process, and doesn't try very hard to schedule fairly or prevent starvation. The strange thing is that Windows has had this problem for years, even though there are many well-known solutions. Is there some other goal that conflicts with good scheduling?

There seems to be a similar problem with I/O. One process reading a lot of files can starve all the others - even of virtual memory, because their page faults are not given higher priority than ordinary I/O. As it happens, one of the programs I maintain at work is memory-intensive and I/O-bound, so when it's running, other processes' memory gets paged out, and they can wait a long time for it to be paged back in. The result is to make my 2 GHz machine feel slower than my pocket calculator.

My coworker felt the same way about his server. He tried to stop the offending services, but the machine was so unresponsive that his Remote Desktop session was repeatedly disconnected before he could do so. I think he eventually gave up and walked over the server room to hit the power switch. But as it turned out, there was nothing wrong with the machine. Only with the Windows scheduler.

2 comments:

  1. Speak for yourself! I've never used a Linux distribution that felt snappier than Windows running on the same box.

    Re I/O priority, that hasn't been fixed in e.g. Linux-land until recently with ionice etc.; similarly, it hasn't been fixed in Windows until Vista, either.

    ReplyDelete
  2. I agree about Linux GUIs not being snappy, but they don't have the annoying unpredictability of Windows. They (and Mac OS X) degrade gracefully under load - everything is slower, but I haven't seen any random long delays, even when running the same I/O-heavy program that cripples my Windows machine. (Although I usually use SSH instead of a GUI on Linux, so maybe I just haven't had a chance to notice.)

    I wonder if what's really going on is that the Windows scheduler is trying to make things snappy by raising the priority of interactive processes, but is identifying the wrong process as interactive.

    ReplyDelete

It's OK to comment on old posts.