Every load test tool is different, but the goal is always the same… to create an artificial load that you believe will demonstrate that your computing world can handle the work.
First, you validate that your load test brings a load that fairly represents a normal, not-too-busy day. If you are testing the live system, run the test at a time when the real user load is very low. Adjust the load test so that you are bringing enough work into the system to emulate a moderate load on a not-too-busy day. If all the performance, response time and throughput meters you have look remarkably similar to what you’d expect a real, live, user load to be at that intensity, then you’ve validated your load test.
Below is an example where a load test was run at moderate load of 500TX/min, late at night when few real users were on the system. The load is turned on and off three times so that we can clearly see the background load that the sleep-deprived late-night users were adding and we can see if our results are repeatable. The chart below shows the overall CPU busy for a key machine.
In the first two tests, the transaction load and the CPU utilization moved together nicely. In the third test, the CPU busy started moving upwards before the transactions were sent, so something else was asking this system to do work. A short investigation might show that some unusual, but explainable, activity caused this We can ignore that third test and then judge if, at 500TX/min, this was a normal amount of CPU consumption compared to the live load. The meters captured during the validation test and when the live user load is on the system do not have to match perfectly. There will always be some noise in the data.
Looking at CPU consumption is a good place to start your validation work, but check your other performance meters as well. See how all the meters match up during the validation test. If some of the meters don’t seem to make sense, there are several things to check, improve, and adjust:
- Make sure all the transactions you are sending in are getting valid responses, not error messages.
- Perhaps there is not enough variability in a given transaction. Searching for the word “cow” a million times in a row is not the same as doing a million searches from a randomly chosen list of a thousand different words.
- If you can’t seem to drive the transaction rate high enough, perhaps you are having a locking issue. For example, simultaneously updating the same user record from many virtual users can prevent the test from scaling up as everyone is waiting for everyone else.
- For load tests with a complex transaction mix you might want to test each transaction type separately, so you know it is working as expected.
- Perhaps you need to script additional transactions and add them to the workload.
- Perhaps you need to adjust the ratio of the different transaction types in the workload mix.
Keep testing and refining your load until it looks right, runs without error, and returns result that are close enough to what you see on the live system. This takes time, but over time you build confidence in your load test.
Analyzing Load Tests
With your test validated, now you are ready to try a full-power load test. This is a big event as you are going to push your computing world hard. Things might break, performance automated alerts will go out, dashboards will turn unhappy colors, and, if this is a test on the live system, user performance will suffer. Be sure that everyone is informed well before the test and has an easy way to communicate during the test.
Start the load test at the level you’ve previously validated. Run at that level long enough so that you have time to check in with the key players to see if their meters are running and everything is as expected. If all is good, then ramp up your load over time until you get to your goal and run at the goal transaction rate long enough to get multiple samples of the internal meters.
This is a happy load test as the average response time stayed nice and level except for one data point at 11:27. Since everything else looks good, you might choose to ignore that and declare a success. However, when you present your results, it is very probable that someone in the meeting will be fascinated with this odd result. You could rerun the test and hope the event was a one-time thing, or you could do the right thing and figure out what caused that spike in response time.
Below is an unhappy load test, as the more work you brought to the system, the worse the response time got. At 100 virtual users, the average response time was 0.7 seconds. Once we got to 150 virtual users the response time started climbing. That is never good.
When setting goals for the load test, you should have well defined upper limits for response time and number of errors. Typically the response time goals are a very small multiple of normal low-load response times. So if normal here is 0.75 seconds, then you might have a peak-load goal of no worse that 2X that number, or 1.5 seconds. Your boss will give you guidance on the ceiling for this number.
Oddly enough, response time can go down under increasing load. This is never a good thing because this only happens when something is failing. It is often faster to fail, than it is to do all the work the transaction requires.
Below is a graph of a bad test where the response time climbs and crashes over and over. This pattern is an artifact of this particular load generation tool. Once this tool hits a certain error threshold, it will reduce the load until the errors subside and then it will ramp up the load again. Regardless of the particular pattern, if you see response time dramatically improving as the load climbs, something is wrong. Start looking for what is failing.
Below is a different chart from the same bad test that shows transactions completed and average response time for those transactions. Notice the lines crossing again and again. The low points are where transactions are failing quickly.
Every load test you ever build or buy will have unique ways of presenting the results and detecting and dealing with errors. Before your first real load test, you should explore the tool, see how it behaves, and understand exactly what the meter is telling you. For example, if the tool gives you a number labeled “transactions”, are they started, or completed, transactions? If the tool gives you a number for virtual users, is that the target number or the actual number? Is that number the average during the sample period or the total number at the beginning of the minute? These fine distinctions make a big difference in how much you can deduce from any meter.
After The Load Test
Once the load test finds a point where the response time gets bad and lots of errors are happening, stop and study the results. Let’s say your computing world ran out of gas at about 1000TX/minute, and your goal was to get to 1600TX/minute. It can be helpful to run an additional test where you approach the failure point in stages, pausing at each stage long enough to give the internal meters time to get several good samples. So your test might run with an unchanging load for five minutes at 800, then jump to 900, then jump to 1000 TX/min.
Your internal meters at 800 TX/min are approximately half as busy as they will be at your target transaction rate of 1600TX/min. This is a good opportunity to do some quick capacity planning and see what you are likely to run out of. Doing this means you get more information out of each load test, and that means lower costs and less 3am load test runs to wake up for. Running the test at 900 and 1000TX/min allows you to double check your work. I’ve seen instances where a resource that looked like it was going to be the bottleneck did not increase in proportion to the additional load.