Load Testing: Creating and Validating The Load

subwayTo create a good load test you not only have to figure out what specific tasks the users are asking your system to do, but also the right mix of tasks (e.g. ten withdraw transactions for every deposit transaction), and what rate you want them delivered to emulate the peak load.

All of this will take some creative application performance metering and some discussions about which transactions to emulate. Let’s tackle these problems one at a time, but first a brief word about abandoning your quest for perfection.

Good Enough Is Just Fine

Give up on the idea of perfection.  There is no “perfect” in load testing, as the users are always changing their behavior and you will never emulate all the different transactions with all the possible user choices.  Why?

Unlike capacity planning, where you can do your work by yourself with a spreadsheet and some metering data, load testing costs money, often requires the help of others, and can be disruptive. Also, load testing is usually done at an hour that is inconvenient for everyone involved. All of this tends to push back hard against perfection.

You are looking for “good enough” to get the job done.   There are many choices you’ll have to make and guesses you’ll have to take, so how do you know the load test you’ve designed is good enough?

Test Validation

When you’ve designed and built your load test, run it at a normal everyday load and see if it works without errors and returns performance meter values that are similar to the ones you get on an average day.  This is known as test validation.

For example, at noon on a pre-peak day your computing world is handling 50 TX/sec, and a key machine is around 20% busy. At midnight that machine is almost idle. You are planning for an upcoming peak that is four times (4X) the noon peak. If your load test reasonably emulates the real user load, then when you run your load test at midnight, sending in 50 TX/sec, the key machine should show 20% busy, and the other key meters in your computing world should also resemble their noon time values.  If half your computing world is idle under this test, clearly you’ve got more work to do. If the numbers are close enough, then your test is good to go.  But, what’s close enough?

Begin with the peak in mind.  The peak you are planning for is 4X the normal noon load of 20% busy.  Any differences in the meters between the real user load and the midnight load test will be 4x greater at the peak, and that means little unimportant differences can become big significant differences. Let’s look at the numbers on three key meters…

loadtable

Meter X matches up nicely. If we were emulating the real user load perfectly we’d expect meter Y and meter Z to match up nicely as well, but they don’t.

Meter Y is much busier at noon than during the midnight load test. Here it makes a difference, as we expect the peak to be 4X the measured day. So, just using basic capacity planning math, two things are clear:

  1. The resource watched over by Meter Y will bottleneck at peak as: 30% * 4 = 120%
  2. This is a difference that makes a difference, as Meter Y during the load test only showed a utilization of 20% busy and that works out to 20% * 4 = 80% at peak.  Meter Y tells us we have more work to do on this load test.

Meter Z is a little off between the noon and the midnight numbers, but this is a difference that you could live with because, even when you scale up the larger sample (10% * 4 = 40%), it’s clear this is not going to be a bottleneck.

You can also do this checking with any meter that counts things of importance to you like packets, IO’s, thingamajigs, whatever. Once the results are close enough, you can trust that your load test will do a good job pushing your computing world as hard as the users will at peak. Now, let’s design a load test.

Selecting Transactions To Emulate

Your computing world handles many different types of transactions, but the bulk of the workload, and/or the bulk of the revenue, comes from just a few of them. It is also the case that many transactions have important differences for users but are computationally identical for your computing world – the bits flow through the same processes and consume the same resources.

Start building a list of transactions to emulate. First add the ones that make up the bulk of your workload. Then add any transactions that bring serious money into the company even if they are not all that numerous.

To a corporation, nothing is more important than money. Follow the money.
                –
Bob’s Seventh Rule of Performance Work

Then add any transactions that have recently caused you trouble and are thus politically sensitive at this time.  Now look that list over and, if it makes things simpler, you can combine transactions that are computationally similar into a generic transaction – the buyX, buyY and buyZ transactions get grouped together into the generic buyStuff transaction.  This will typically leave you with a short list of transactions.

Scripting Transactions

Users are not identical robots typing the same things over and over at machine-like speeds.

robotrobotrobotrobot

Users are unique, they pause to consider and to choose, and they do different things.  There are constraints (logical, legal, and practical) as to what your users can do. You can’t login simultaneously from two different cities, you can’t withdraw with a zero balance, and you can’t put a trillion things in your shopping cart. Whatever generates the load for your load test has to be able to handle that. Look for load generation tools that have a way to:

  • Record a script of actions for each type of transaction you decide to emulate
  • React to information presented during the transaction, such as an item out of stock
  • Determine if the response indicates success or failure (“Deposit accepted” vs. “D603 Database error…”
  • Add variability to that script by doing different things on each visit
  • Authenticate themselves so your security software will allow these transactions
  • Build think time between steps of a transaction to emulate the behavior of real users as they pause and consider

Scripting transactions takes work.  First you get it to work once, then you add variability into it (different users doing different things), and then you find security or reasonableness checks you have to either deal with or work around. For multi-step transactions (e.g., login, shop, buy), you need to test at each step to see if it was successful and to “report & abort” the transaction if it was not.

Typically, you work on one transaction at a time, testing it over and over until you are satisfied. Then you create and test the next transaction.  When you’ve got all the transactions in your load test working and tested individually, then you test them together, typically at low-power, to convince yourself that they all work together smoothly. Look for the results you see in the meters to match up nicely with the meters you get from the normal load generated by real users.

You Can Do This

All this work might sound overwhelming, but lots of regular people, no smarter than you, do it every day. The key is to begin somewhere, with some load generating tool and some transaction and build from there. At Ben & Jerry’s scoop shops they sell something called a Vermonster. A sundae with 20 scoops of ice cream, hot fudge, bananas, cookies, brownies, and all of your favorite toppings. How to you eat such a thing? One spoonful at a time with the help of others.

vermonster

Leave a comment