Deconstructing Response Time

The overall response time is what most people care about. It is the average amount of time it takes for a job (a.k.a. request, transaction, etc.) to get processed.  The two big contributors to response time (ignoring transmission time for the moment) are the service time: the time to do the work and the wait time: the time you waited for your turn to be serviced.  Here is the formula: ResponseTime = WaitTime + ServiceTime

service center 3

If you know the wait time, you can show how much faster things will flow if your company spends the money to fix the problem(s) you’ve discovered. If you know the service time then you know the max throughput as MaxThroughput ≤ 1 / AverageServiceTime 

For example: A key process in your transaction path with an average service time of 0.1 seconds has a maximum throughput of: 1 / 0.1 = 10 per second.

Sadly, response time is the only number that most meters are likely to give you. So how do you find the wait and the service time if there are no meters for them? The service time can be determined by metering the response time under a very light load when there are plenty of resources available. Specifically, when:

  • Transactions are coming in slowly with no overlap
  • There have been a few minutes of warm-up transactions
  • The machines are almost idle

Under these conditions, the response time will equal the service time, as the wait time is approximately zero.

ServiceTime + WaitTime = ResponseTime
ServiceTime + 0 = ResponseTime
ServiceTime = ResponseTime          

The wait time can be calculated under any load by simply subtracting the average service time from the average response time.

WaitTime = ResponseTime – ServiceTime

Performance work is all about time and money. When you’ve found a problem, a question like: “How much better will things be when you fix this?” it is a very reasonable thing for the managers to ask. These simple calculations can help you answer that question.

Other helpful hints can be found in: The Every Computer Performance Book which is available at Amazon, B&N, or Powell’s Books. The e-book is on iTunes.

Capacity Planning Guarantees You Can’t Make

Capacity planning should never be sold as a guarantee that all will be well at the next peak. No matter how good a performance person you are, you can’t offer that guarantee. monstersWhy? You cannot prove a negative. For example, you can’t prove there are no monsters waiting to get you while you sleep because no matter how carefully you check, you might overlook some spot (like the closet) where they are hiding.

Liebig’s Law clearly shows that even a small and obscure part of the transaction path can become a major bottleneck if given enough work to do.

Capacity planning is more like a pre-trip checklist to ensure you have what you need, and all systems on this list are good-to-go. Invariably, you will go on that trip, and somewhere along the way you’ll discover you forgot X, don’t have enough Y, and for the first time ever you need Z. That’s all bad news, but remember that your capacity planning effort found bottlenecks that would have limited your throughput even more.

Even if you use load testing to add to your confidence, no load test is perfect, so you still can’t honestly guarantee a trouble-free peak.

So, do capacity planning to the best of your ability with the things you do know about, load test to the best of your abilities, but make no absolute promises. If you get caught short on some resource, take the time before the next big peak to learn about that resource and to do a more complete plan next time. Unless this is the last peak before you retire, you need to think long-term.

A Cry For Help

As the incoming work pushes your computing world beyond its limits, this is the throughput graph that you’ll most likely see. Learn to recognize it as a cry for help.


On an idle system, when work shows up, it gets processed right away and exits. ResponseTime = ServiceTime. So early on, as the workload starts to build for the day, the arrival rate of work equals the throughput. This happy circumstance continues until some part of the transaction path can no longer keep up with the arriving work. Now the following things start happening:

  • Work has to wait as the computing resource is busy.
  • Response Time climbs as it now includes significant Wait time
  • The throughput stops matching the arrival rate of new work

At some point the throughput stops going up and can actually go down as algorithms are pushed beyond their design limits and become dysfunctional.

When you see the throughput of the system flatten out like this, somewhere in the transaction path a resource is 100% busy. This is a cry for help. Learn to recognize it.

There is more to learn about this and other subjects in my book:
The Every Computer Performance Book


Practical Insights From Queueing Theory

Queuing theory provides a way to predict the average delay when work builds up at a busy device. The calculations are complex, but luckily we can often ignore the math and focus on the key insights this branch of mathematics can bring to things that are busy. First, let’s define a few terms:

Service Center and Service Time

A service center is where the work gets done. To accomplish a given task, it is generally assumed that it takes a service center a fixed amount of time – the service time. In reality this assumption is usually technically false, but still useful.  If work arrives faster than it can be processed a queue builds and the average response time grows because work has to wait in a queue to be serviced.

service center 3

Queuing Theory

As a service center gets busy, it becomes more likely that a newly arriving job will have to wait because there are jobs ahead of it. An approximate formula that describes this relationship is: ResponseTime =  ServiceTime / (1 -Utilization)

The real insight comes from looking at the graph of this function below, as the utilization is goes from 0% to 90%.


Notice that response time starts out as 1x at idle. At idle the response time always equals the service time as there is nothing to wait for.

Notice that the response time doubles when the service center gets to 50% utilization. At this point, sometimes the arriving jobs finds the service center idle, sometimes they find it with several jobs already waiting, but the effect on the average job is to double the response time as compared to an idle service center. The response time doubles to 4X when the service center is at 75% utilization and doubles again to 8x at around 87% utilization. Assuming you kept pushing more work at the service center, the response time doublings keep getting closer and closer (16x at 94% utilization, 32x at 97% utilization) as the curve turns skyward. All these doublings are created by the fact that the service center is busy, and thus there will often be many jobs waiting ahead of you in the queue.

Insight #1:

The slower the service center, the lower the maximum utilization you should plan for at peak load. The slowest computer resource is going to contribute the most to overall transaction response time increases. Unless you have a paper-tape reader as part of your transaction path, the slowest part of any computer in the early part of the twenty-first century is the rotating, mechanical magnetic disks. At the time of this writing, on an average machine, fetching a 64 bit word from memory was ~50,000x faster than getting it off disk.

The first doubling of response time comes at 50% busy and that is why conventional wisdom shoots for the spinning magnetic disks to be no more than 50% busy at peak load.  Think about it this way, at 50% busy you are doubling the response time of the slowest part of your transaction path – that has got to hurt. If the boss insists that you run the disk up to 90% busy then the average response time for a disk read is about 10X larger than if the drive was idle. Ouch!

Insight #2:

It’s very hard to use the last 15% of anything. As the service center gets close to 100% utilization the response time will get so bad for the average transaction that nobody will be having any fun. The graph below is exactly the same situation as the previous graph except this graph is plotted to 99% utilization. At 85% utilization the response time is about 7x and it just gets worse from there.


Insight #3:

The closer you are to the edge, the higher the price for being wrong. Imagine your plan called for a peak of 90% CPU utilization on the peak hour of your peak day but the users didn’t read the plan. They worked the machine 10% harder than anticipated and drove the single CPU to 99% utilization. Your average response time for that service center was planned to be 10x, instead it is 100x. Ouch!  This is a key reason that you want to build a safety cushion into any capacity plan.

Insight #4:

Response time increases are limited by the number that can wait. Mathematically, the queuing theory calculations predict that at 100% utilization you will see close to an infinite response time. That is clearly ridiculous in the real world as there are not an infinite number of users to send in work.

The max response time for any service center is limited by the total number of possible incoming requests. If, at worst case, there can only be 20 requests in need of service, then the maximum possible response time is 20x the service time. If you are the only process using a service center, no matter how much work you send it, there will be no queuing-based increase in response time because no one is ever ahead of you in line.

Insight #5:

Remember this is an average, not a maximum. If a single service center is at 75% utilization, then the average response time will be 4x the service time. Now a specific job might arrive when the service center is idle (no wait time) or it might arrive when there are dozens of jobs ahead of it to be processed (huge wait time).

The higher the utilization of the service center the more likely you are to see really ugly wait times and have trouble meeting your service level agreements. This is especially true if your service level agreements are written to specify that no transaction will take longer than X seconds.

Insight #6:

There is a human denial effect in multiple service centers. If there are multiple service centers that can handle the incoming work, then, as you push the utilization higher, the response time stays lower longer. Eventually the curve has to turn and when it does so the turn is sudden and sharp!


This effect makes sense if you think about buying groceries in a MegaMart. If at check out time seven cashiers are busy and three are idle, you go to the idle cashier. Even though the checkout service center is 70% busy overall, your wait time is often zero, and your response time is equal to the service time. Life is good.

If you have a computer with eight available CPUs the response time will stay close to the service time as the CPU busy climbs to around 90%. At this point the response time curve turns sharply. At 95% busy, and the system becomes a world of response time pain. So, for resources with multiple service centers, you can run them hotter than single service center resources, but you have to be prepared to add capacity quickly or suffer horrendous jumps in response time. Most companies are much better at understanding real pain they are experiencing now, as opposed to future pain they may experience if they don’t spend lots of money now.

For More Information:

To begin to explore the mathematical underpinnings of this post you can begin here’s_law and here, but it is a long way down those rabbit holes.

There are more easy performance-related mathematical insights in my book:
The Every Computer Performance Book

A short, occasionally funny, book on how to solve and avoid application and/or computer performance problems


The Law of The Minimum

Long before computers, two chemists named Sprengel and Liebig were working in the area of agricultural chemistry. Among their many accomplishments, Sprengel pioneered and Liebig popularized the Law of the Minimum, which states that growth is limited by the least available resource. If a plant is starving for nitrogen, then only additional nitrogen will get things growing again. Everything else you give it doesn’t help. Oddly enough the Law of the Minimum applies to computers, too.

When you are out of X, adding anything else but X won’t help at all. Extra resources may help new transactions move through the system quicker, but all that really does is hurry them to the X bottleneck, where they will find themselves at the end of a very long line of transactions waiting their turn. If you screw up and add the wrong resource, the queue of waiting tasks at the least available resource may just get longer and there will be no positive effect on throughput or response time.

In the biological sciences too much of something (water, warmth, etc.) can kill you just as easily as too little.  In computers, if you have too much of some resource there is no negative effect on the overall capacity of the system to get things done. Every so often I run into people who believe the myth that a system has too much of a given resource and that somehow hurts performance. It takes considerable time to talk them out of this belief. Be patient with them. It is the case that when something suddenly and rapidly dumps work into the system (e.g. a system coming back online after a comm failure) then that can cause a performance disruption. But the right fix for that is to put a bit of flow control in so these rare events don’t drown the system in transactions and kill performance.

In business, money is often the most limited resource. If the system has too much of some resource, you are wasting money. The trick is to always have just enough resources in place to handle the peak plus a bit more as a margin of safety. Any fool can solve bottlenecks with limitless money.

The Hidden Bottleneck

When you run out of some resource, that resource becomes a bottleneck. All the transactions race through the system only to find a huge queue because of that resource limitation.

The double-necked hourglass illustration shows a bottleneck at point A. Beyond that bottleneck life is easy for the rest of the system as, no matter how many transactions arrive, the workload is throttled by the upstream bottleneck.


If you “fix” bottleneck A then performance will be really good for only about 45 milliseconds until that great load of transactions hits bottleneck B with a sickening “WHUMP!”  The throughput of this system will hardly change at all, and you will have some explaining to do in the boardroom.

When capacity planning, it is important to explain this drawing to the decision makers so they comprehend how one bottleneck can hide a downstream bottleneck. It is also key for you to meter all the resources you can deplete, not just the one that is the obvious bottleneck.

If your response time is growing, then there is a bottleneck somewhere. If the meters you are looking at (for bottleneck B) are showing lots of capacity, then you are looking in the wrong place. However, it is important to look at them as they can tell about your future. If you need this hourglass to do 10X the work it is doing now, and the meters for bottleneck B are showing that part of the system is 50% busy, then there is no way that part of the system can do 10x the work. Your current problem may lie elsewhere, but you’d better put bottleneck B on your to-do list.

The hourglass could easily have been drawn with many more bottlenecks, but I’ve never seen a performance problem where there were more than two bottlenecks that had to be cleared up to get the needed throughput. If you are working on the fourth bottleneck for this given problem, then perhaps you should spend some time thinking about a new career – because you are most likely deep in the weeds.

For more information on finding, fixing, and avoiding bottlenecks, as well as capacity planning, I’d suggest you read my book The Every Computer Performance Book

For more details on Liebig’s Law of the Minimum see: