/ linux

Self-Service Linux®: Mastering the Art of Problem Determination

I've been a Linux sysadmin a long time, and have a certain knack for tracking down system problems (so I'm told). In the midst of perhaps the worst set of hardware/software problems I've ever faced, I stumbled across this book that describes exactly what I do, all day long, codifies it, and tells me how I can get better at it. This is awesome.

My dead tree copy's in the mail, however, I'm reading the free electronic copy from linux-books.us and it's awesome.

It takes a lot of study to really grasp the whole relationship between hardware and physical machine, kernel, memory, disk, virtual memory, threading, processors, instructions, system calls, networking layers, web server software. And, even then, textbook study will not give a working knowledge, it takes sitting down with the terminal and examining the system to realize the concepts.

Look, not to brag, but I know a LOT about Linux systems and computers in general, but compared to someone like Linus Torvalds I'm an idiot. I have a decent working concept of an operating system kernel that lets me figure out why my software is behaving oddly, exhibiting odd performance characteristics, failing fantastically, but I couldn't sit down and draw you a picture of the Linux kernel's structure. I do know enough though, to be effective, and to read kernel driver sources to figure out where an error originates from — critical when tracking down a potential hardware issue.

I can't wait to improve my own "haphazard but effective" problem determination methods.

Why do we need all this complexity? In practice, it works well. Should we come up with something better? Absolutely, but that's a much deeper topic, left for a bright sunny day when I can escape from my server room.