You build a performance model to answer a question that capacity planning or load testing can’t answer. Since you can’t test the future, you have to meter what you’ve got in clever ways and then mathematically predict for the future reality.
Performance Model Questions
Let’s start with a couple of performance model questions: “Will upgrading to the new UltraBogus 3000 computer (pictured above) solve our performance problem?” or “If we move just the X-type transactions to the XYZ system, will it be able to handle the load?” You can’t meter or load test a computer you don’t have and the performance meters typically show the net result of all transactions, not just X-type transactions.
To answer these kind of questions, you have to know the picky detail of how work flows through your system and what resources are consumed when doing a specific type of transaction. Here are a couple of ways I’ve done this.
Finding Specific Transaction Resource Requirements
You can use load testing to explore the resource consumption and transaction path of a given transaction, by gently loading the system with that kind of transaction during a relatively quiet time. Send enough transactions so the load is clear in the meters, but not so many as the response time start degrading. Any process or resource that shows a dramatic jump in activity is part of the transaction path. Imagine you got this data on a key process before, during, and after a five-minute load test on just transaction X:
Now do a bit of simple math to figure out the resource consumption for a given X transaction. For reads it looks as though the number per second increased by about 100/sec during the test, so it’s reasonable to estimate that for every X transaction this key process does one read. The writes didn’t really change, so no writes are done for the X transaction. First notice if the before and after consumption of resources is about the same. If so, it is a good sign that during the load test there was no big change in the user-supplied load. I’d recommend you repeat this test a couple of times. If you get about the same results each time, your confidence will grow, and the possibility of error is reduced.
For CPU, the consumption during the load test jumped by about 600 milliseconds per second and therefore each X transaction requires 600 / 100 = 6ms of CPU. Overall, this test tells us that the X transaction passes through this key process, does one read, does no writes, and burns six milliseconds of CPU.
Please note, the math will never be precise due to sampling errors and random variations. This methodology can give you a value that is about right. If you need a value that is exactly right, then you need to run this test on an otherwise idle system, perhaps at the end of a scheduled downtime.
Finding The Transaction Path
You can use a network sniffer to examine the back and forth network traffic to record the exact nature of the interprocess communication. For political, technical, and often legal reasons, this is usually only possible on a test system. Send one transaction in at a time separated by several seconds, or with a ping, so there is no overlap and each unique interaction is clear.. Study multiple examples of each transaction type.
The following timeline was created from the network sniffer data that showed the time, origin, and destination of each packet as these processes talked to each other. During this time there were two instances of the X transaction, with a ping added so the transaction boundaries were completely clear. We can learn many things from this timeline.
First we know the transaction path for the X transaction (A to B to C to B to A), which can be very useful. Please note, a sniffer will only show you communication over the network. It won’t show all the dependencies with other processes and system resources that happen through other interprocess communication facilities. This give you a good place to start, not the whole map of every dependency.
We can also see exactly how long each process works on each thing it receives. Process C worked on each X transaction for about 200 milliseconds before responding. Little’s law (and common sense) show if you know the service time you can find the maximum throughput because:
MaxThroughput ≤ 1 / AverageServiceTime
Since this was an idle system, the wait time was zero, so the response time was the service time. If we do the calculation and find that 1/.2 = 5, then we know that Process C can only handle five X transactions per second.
If the question for the model was: “If we don’t reengineer Process C, can we get to 500 X transactions/sec?” The answer is no. If the question for the model was: “How many copies of Process C do we need to handle 500 X transactions a second?” The answer is going to be at least 500/5 = 100, plus more because you never want to plan for anything to be at, or near, 100% busy at peak.
When it’s time to build a model, often you’ll need to know things that can be difficult to discover. Share the problem widely as others might have a clue, a meter, or a spark of inspiration that will help you get the data. Work to understand exactly what question the model is trying to solve as that will give you insight as to what alternative data or alternative approach might work.
While you’re exploring also keep an eye on the big picture as well. Any model of your computing world has to deal with the odd things that happen once in a while. Note things like backups, end of day/week/month processing, fail-over and disaster recovery plans.