Many performance meters in your computing world will tell you how busy things are. That’s nice, but to make sense of that data, you also need to know how much work the system is being asked to handle.
With workload data you see the performance meters with fresh eyes, as now you can evaluate how the system is responding to that workload and, with proper capacity planning or modeling, how the system will respond to a future peak workload.
To find or create a workload meter you and your coworkers have to agree on what the workload is and how to measure it. This will take some time as you’ve got to choose the transactions that will represent your workload from the many unique transactions users send in. Every company will settle on a different scheme and there is no perfect solution. Here are a few common ones I’ve encountered:
- Treat all incoming transactions the same. Simply count them and you have your workload number.
- Notice the vast majority of your incoming transactions do a similar thing, so count them as your workload number and ignore the others.
- Only count the transaction that was at the center of your last performance catastrophe as your workload. This may be an unwise choice as always swinging to hit the previous pitch you missed will not improve your batting average.
- Use the amount of money flowing into the company as the transaction meter. At $10K/min the CPU is 35% busy.
Whatever you decide to do will work fine as long as it passes the following simple test: Changes in workload should show proportional changes in the meters of key resources. What you are looking for is data like you see below.
Clearly as the measured workload increases the utilization of Resource X follows along with it. It is just fine that the lines don’t perfectly overlap. They never will. It is the overall shape that is important. You are looking for these values to move in synchrony – workload changes cause a proportional change in utilization. Now, let’s look at Resource Y below.
Resource Y is not experiencing any changes in utilization as the workload changes. This resource is not part of the transaction path for this workload. If it is supposed to be, perhaps you accidentally metered the wrong resource, or the right resource on the wrong system. I’ve made both of those mistakes.
Now, let’s look at Resource Z below.
The utilization of Resource Z mostly tracks the workload meter (they rise and fall together) except for about an hour starting at 19:49 and ending at 20:39. Here you need to use some common sense, as either:
- The utilization spike could have been caused by something not related to the normal workload like a backup, or a software upgrade, or just the side effect of hitting some bug. In that case you can ignore the spike as you evaluate this workload meter.
- The utilization spike showed a dramatic increase in the workload, but your proposed workload meter did not see it. If your proposed workload meter missed a dramatic and sustained increase like we see here, then you need to search for a better workload meter.
Once you’ve decided what will serve as the workload meter, how do you get the data you need? It would be lovely if the application gave you an easy-to-access meter for that, but that rarely happens. Usually, you have to look in odd places. If XYZ transactions are going to be your workload indicator, then you need to find some part of the XYZ transaction path where you can find something to meter that uniquely serves that transaction. Here are some of the things I’ve done in the past to ferret out this key information:
- For every XYZ transaction, process Q does two reads to a given file. Take the number of reads in the last interval and divide them by two to get the transaction rate.
- For every 500 XYZ transactions, process Q burns one second of CPU. Take the number of CPU seconds consumed during the interval and multiply it by 500 to get the transaction rate.
- For every XYZ transaction file Z grows by 1200 bytes. Take the change in the file size during the interval and divide that by 1200 to get the transaction rate.
- For every XYZ transaction two packets are sent. Divide the packet count by two to get the transaction rate.
This list goes on, but the basic trick remains the same. First find some meter that closely follows the type of transaction you want to use as a workload meter. Then figure out how to adjust it mathematically so you get a transaction count.
That mathematical adjustment usually requires you to get multiple days worth of data and then, using whatever data you can get on your transaction, come up with the appropriate adjustment.
Reasonable people can argue that it is impossible to summarize a complex workload into one number. That may be true, but you can still do wildly useful things if you find a workload meter that approximately tracks the utilization of key components nicely. Every evening on the news they quote a major stock index (like the DOW, the FTSE, or the Nikkei) and we find that a useful gauge of the overall economy. When selecting the workload, don’t go for perfect, go for close enough.
There is more to learn about performance work in this blog and in my book
The Every Computer Performance Book, which you can find on Amazon and iTunes.
of course it’s not how much data you collect, but what you do with it that counts – and there needs to be some level of automation built into this process to deal with the ‘big’ data. Also there needs to be some constraints, because not all workload metrics need to be correlated against all resource utilization metrics – so applying a service model can be a huge labor (or compute) saver.
What’s interesting to consider is when there are multiple workloads against a single application. In these circumstances it’s also important to correlate the workload metrics to determine whether certain transactions are related and thus derive relationship (how many orders do we see for every ‘add to basket’).
I’m not technically sure whether this domain is venturing into the ‘big data’ category, but there’s certainly a lot of data processing required to derive beneficial insight.
You are absolutely right. It’s what you do with the data that counts. In the beginning, when I know little about a new situation, I like to collect every meter I can find for a few days and then look for patterns. As the load goes up and down, what moves with it, and what doesn’t move with it, tells me a lot about where activity is happening and what I can ignore. Once I’ve got the big picture, then I can focus and filter down to the few key meters that are clearly more important.
In my career, I was a performance gypsy – moving from customer to customer. So I would always start with a very wide net. I also had tools that would cherry pick the key numbers out of the voluminous metering output and put them in CSV formatted file. I’d pull that into Excel and the patterns would quickly jump right out at me. That’s how I solved the ‘big data’ problem.
My apologies to my readers that believe that ‘big data’ starts at the petabyte level. I’m just a kid from Vermont trying to boil a gallon of performance data into an ounce of understanding. 😉