Given that it is a good idea to keep an eye on performance all the time, there are lots of companies that only allow you pay periodic attention to performance. They focus on it when there is a problem, or before the annual peak, but the rest of the year they give you other tasks to work on.
This is a lot like my old job in Professional Services – A customer has a problem, I fly in, find the trouble, and then don’t see them until the next problem crops up.
To do that job I relied on three tools that I created for myself and that you might start building to help you work on periodic performance problems.
List All – The first tool would dig through the system and list all the things that could be known about the system: config options, OS release, IO, network, number of processes, what files were open, etc. The output was useful by itself as now I had looked in every corner of the system and knew what I was working on. Several times it saved me days of work as the customer had initially logged me into the wrong system. It always made my work easier as I had all the data I needed, in one place, conveniently organized, and in a familiar order.
Changes – If I’d been to this customer before, this tool allowed me to compare the state of the system with the previous state. It just read through the output of the List All I’d just done and compared it with the data I collected on my last visit. Boy, was this useful as I could quickly check the customer’s assurance that “Nothing had changed since my last visit.” I remember the shocked look on the customer’s face when I asked: “Why did you downgrade memory?”
Odd Things – Most performance limiting, or availability threatening, behavior is easy to spot. But for any OS, and any application, there are some things that can really hurt performance that you have to dig for in odd places with obscure meters. These are a pain to look for and are rare, so nobody looks for them. Through the years as I discovered each odd thing, I would write a little tool to help me detect the problem and then I’d add that tool to the end of my odd things tool. I’d run this tool on every customer system I looked at and, on occasion, I would find something that surprised everyone: “You haven’t backed up this system in over a year.” or solved a performance problem by noticing a
foolish less than optimal configuration choice.
With most everything happening on servers somewhere in the net/cloud these days, knowing exactly where you are and what you’ve got to work with is important. Being able to quickly gather that data in a matter of minutes allows you to focus on the problem at hand confident that you’ve done a through job.
All three of these tools were built slowly over time. Get started with a few simple things. The output of all three is just text – no fancy GUI interface or pretty plots are required. When you have time, write the code to gather the next most useful bit of information and add that to your tool.
Just like the old folk story of stone soup, your tools will get built over time with the contributions of others. Remember to thank each contributor for the gifts they give you and share what you have freely with others.