Three Tools You Should Build

Given that it is a good idea to keep an eye on performance all the time, there are lots of companies that only allow you pay periodic attention to performance. They focus on it when there is a problem, or before the annual peak, but the rest of the year they give you other tasks to work on.

toolsThis is a lot like my old job in Professional Services – A customer has a problem, I fly in, find the trouble, and then don’t see them until the next problem crops up.

To do that job I relied on three tools that I created for myself and that you might start building to help you work on periodic performance problems.

Three Tools

List All – The first tool would dig through the system and list all the things that could be known about the system: config options, OS release, IO, network, number of processes, what files were open, etc. The output was useful by itself as now I had looked in every corner of the system and knew what I was working on. Several times it saved me days of work as the customer had initially logged me into the wrong system. It always made my work easier as I had all the data I needed, in one place, conveniently organized, and in a familiar order.

Changes – If I’d been to this customer before, this tool allowed me to compare the state of the system with the previous state. It just read through the output of the List All I’d just done and compared it with the data I collected on my last visit. Boy, was this useful as I could quickly check the customer’s assurance that “Nothing had changed since my last visit.” I remember the shocked look on the customer’s face when I asked: “Why did you downgrade memory?”

Odd Things – Most performance limiting, or availability threatening, behavior is easy to spot. But for any OS, and any application, there are some things that can really hurt performance that you have to dig for in odd places with obscure meters. These are a pain to look for and are rare, so nobody looks for them. Through the years as I discovered each odd thing, I would write a little tool to help me detect the problem and then I’d add that tool to the end of my odd things tool.  I’d run this tool on every customer system I looked at and, on occasion, I would find something that surprised everyone: “You haven’t backed up this system in over a year.” or solved a performance problem by noticing a foolish less than optimal configuration choice.

With most everything happening on servers somewhere in the net/cloud these days, knowing exactly where you are and what you’ve got to work with is important. Being able to quickly gather that data in a matter of minutes allows you to focus on the problem at hand confident that you’ve done a through job.

Stone_SoupAll three of these tools were built slowly over time. Get started with a few simple things. The output of all three is just text – no fancy GUI interface or pretty plots are required. When you have time, write the code to gather the next most useful bit of information and add that to your tool.

Just like the old folk story of stone soup, your tools will get built over time with the contributions of others. Remember to thank each contributor for the gifts they give you and share what you have freely with others.


Other useful hints can be found in: The Every Computer Performance Book which is available at Amazon, B&N, or Powell’s Books. The e-book is on iTunes.


 

 

The Five Minute Rule

I used to travel to companies to work on their performance problems. Once I arrived, and we had gone through the initial pleasantries, I would ask them a simple question:

How busy is your system right now?

If the person I asked had a ballpark estimate that they quickly confirmed with a meter, I’d know that whatever problem they called me to solve would be an obscure one that would take some digging to find. If they had no idea, or only knew the answer to a resolution of an entire day, then I was pretty sure that there would be plenty of performance-related ugliness to discover. This little question became the basis for a rule of thumb that I used my entire career.  five

The less a company knows about the work their system did in the last five minutes, the more deeply screwed up they are.

Your job is likely different than mine was, but there is a general truth here. If the staff can easily access data that tells them in the last few minutes how much work is flowing through the system (transactions, page views, etc.) and how the system is reacting to that work (key utilization numbers, queue depths, error counts, etc.), then they have the data to isolate and solve most common performance problems. Awareness is curative. The company will solve more of it own performance problems, see future performance problems coming sooner, and spend less money on outside performance experts.

 


Other rules of thumb can be found in: The Every Computer Performance Book which is available at Amazon, B&N, or Powell’s Books. The e-book is on iTunes.


 

Deconstructing Response Time

The overall response time is what most people care about. It is the average amount of time it takes for a job (a.k.a. request, transaction, etc.) to get processed.  The two big contributors to response time (ignoring transmission time for the moment) are the service time: the time to do the work and the wait time: the time you waited for your turn to be serviced.  Here is the formula: ResponseTime = WaitTime + ServiceTime

service center 3

If you know the wait time, you can show how much faster things will flow if your company spends the money to fix the problem(s) you’ve discovered. If you know the service time then you know the max throughput as MaxThroughput ≤ 1 / AverageServiceTime 

For example: A key process in your transaction path with an average service time of 0.1 seconds has a maximum throughput of: 1 / 0.1 = 10 per second.

Sadly, response time is the only number that most meters are likely to give you. So how do you find the wait and the service time if there are no meters for them? The service time can be determined by metering the response time under a very light load when there are plenty of resources available. Specifically, when:

  • Transactions are coming in slowly with no overlap
  • There have been a few minutes of warm-up transactions
  • The machines are almost idle

Under these conditions, the response time will equal the service time, as the wait time is approximately zero.

ServiceTime + WaitTime = ResponseTime
ServiceTime + 0 = ResponseTime
ServiceTime = ResponseTime          

The wait time can be calculated under any load by simply subtracting the average service time from the average response time.

WaitTime = ResponseTime – ServiceTime

Performance work is all about time and money. When you’ve found a problem, a question like: “How much better will things be when you fix this?” it is a very reasonable thing for the managers to ask. These simple calculations can help you answer that question.


Other helpful hints can be found in: The Every Computer Performance Book which is available at Amazon, B&N, or Powell’s Books. The e-book is on iTunes.


How To Become A Performance Guru

Performance work is a great career as everything change over time and with each change comes new performance challenges. There are always things to do and things to learn. Good performance work can save the company and put your kids though college. Yay!

Bad News… The Path Is Not Easy

This is a hard skill to learn as the knowledge required is diffused throughout many different sources. Let me explain…

First, performance books… Some are built on very difficult math that most people can’t do and most problems don’t require. They unnecessarily discourage many people. Many books focus on a specific product version, but you don’t have that version in your computing world. There is often no performance book for a key part of your transaction path.

Turning to manuals… Almost all manuals focus on a specific version of a technology and were written under tremendous time pressure at about the same time the engineering was being completed; thus the engineers had little time to talk to the writers. The manuals ship with the product. The result is that these books document, but they don’t illuminate. They explain the what, but not the why. They cover the surface, but don’t show the deep connections.

Should you accept the bad news, stop reading here and give up?  I don’t think so. There is hope. Hear me out…

First Of All, Don’t Worry About The Math

For 99% of the performance work out there you don’t need to use complex performance math equations.  The most complex formula I used in 25+ years of performance work is the one that approximately predicts how the response time will change as the utilization of a resource increases:
                    R = S / (1 – U)
If you can replace S with the number 2 and U with the number 0.5 and calculate that R is equal to (spoiler alert) 4, then you have all the math you need for a long career in performance.

Mine Low Grade Information for Gold…

rand_472

The UltraBogus 3000 features fully-puffed marketing literature, a backwards-compatible front door, and a Stooge-enabled, three-idiot architecture that processes transactions with a minimum of efficiency. Its two-bit bus runs conveniently between your office and the repair facility every Tuesday. The steering wheel was added because the marketing VP thought it needed more chrome.

RTFM (Read The ‘Fine’ Manual)

If your company just bought an UltraBogus 3000 (see picture to left) to handle your peak load then read the manuals cover-to-cover. You’ll be surprised what you find.

Sometimes what you find is a limit that is better discovered now than when you blindly hit it at the seasonal peak. Sometimes it is a question you never thought to ask. Sometimes it is a way to make your job vastly easier – even the worst product has some good features. You have to mine a ton of ore to find an ounce of gold.

You’ll (hopefully) be doing this job for years, take 15 minutes a day and chew your way though the manuals.

Lastly, reading the manuals teaches you the vocabulary you need to use when you call tech support. If you want to talk to their wizards, first you need to convince the people who initially take the call that you’re not an idiot.

readRead Performance Books

I wrote one that I think is generally useful and there are many others that will illuminate particular problems and show different ways of solving them.

They all have their strengths and weaknesses, but there is good stuff to be found there. Especially if the company is buying, try reading the ones with scary looking equations. Push yourself into unfamiliar territory. Even if you can’t understand it, having it on your bookshelf will intimidate your enemies. 😉

If you want to be a performance guru then be all you can be. Read.

Search and Connect

Search engines are your friend. If you have a problem with X-technology, then it is highly likely that someone else has too. Ask simple questions and see what comes up. A lot of it is low-grade information, but sometimes you find just the hint you need.

LinkedIn has groups that are focused on every conceivable technology. Join a few a see if you can find a rich vein of information. There is also CMG and websites focused on performance like PracticalPerformanceAanalyst or PerfBytes to explore.

Now comes the tough part…

After you explore the sources above there are still many things of great importance you still won’t know. Performance work is in many ways a skill you teach yourself with the help of others. You have to dive in, like an explorer on a new planet, and try to make sense of the computing world you stand upon. startI’m often asked: Where do I begin?  My answer is to pick a small performance-related thing that interests you and deeply explore it. As you explore, you’ll find other mysteries. Don’t worry about them, just put them on the list. Once you master the first thing, go for the next thing on the list. Over time you’ll have more and more helpful things to contribute and your job will mutate into a performance job. Most performance people start as something else (like a programmer or a sys admin) and slowly move into the performance field. You don’t have to know it all day one. Actually, you never know it all and that is what makes the work interesting.


This blog is based on: The Every Computer Performance Book which is available at AmazonPowell’s Books, and on iTunes.


 

 

When You Care Enough To Do Less

doctor_groucho
It’s the oldest joke in the book…

      Patient: When I do this it hurts.

      Doctor: Well don’t do that.

Sometimes performance work is not about adding hardware or tuning applications, its about doing less and doing it smarter. Send a kilobyte not a megabyte, don’t lock all the records when you don’t need to, etc.

For example, what you put in the files served by your website has a huge impact on performance that no amount of server-side hardware can overcome because you don’t control all the computers/networks between you and the enduser. Many times the only way to fix website response time problems is to send less stuff in a smarter way.

I recently ran across Zoompf.com which has a nice tool to analyze your website and make helpful recommendations to speed it up. They do a good job of explaining why the changes they recommend are important and further provide helpful references to more information about each recommendation.

To avoid mistakes you haven’t made yet, you might also want to read a wonderful little book called High Performance Web Sites by Steve Souders. It points out a lot of small changes that can make a big difference in website performance.

Mostly companies prefer to throw hardware at performance problems, rather than adjust applications, algorithms, or outputs, because it is seen as the low-risk path. Sometimes that works, but sometimes the right thing to recommend is: “don’t do that.”


After you read Steve’s book, try mine: The Every Computer Performance Book at  AmazonPowell’s Books, and on iTunes.


Standards

The brilliant comic XKCD perfectly describes my feelings about standards in performance work with this simple cartoon:

standardsThere are many paths to the top of the performance mountain. If, whatever you believe is the right way to do performance work, gets you to the top then go with that.

K2_south_routes

But, while climbing the mountain, keep your eyes open and notice what the other climbers are doing. If you see some cool climbing trick, learn it. Whenever possible, share what you know and help those that are struggling. If you are struggling, that is often natures way of telling you to open your mind other options, opinions, pathways, and climbing techniques.

It is not about the dogma, its about getting the work done and keeping the users happy.