As the user load increases from light to crazy-busy there are a three things I’ve noticed that generally tend to be true over every customer system I’ve ever worked on.
Resource Usage Tends to Scale Linearly
Once you have a trickle of work flowing through the system (which gets programs loaded into memory, buffers initialized and caches filled), it has been my experience that if you give a system X% more work to do, it will burn X% more resources doing that work. It’s that simple. If the transaction load doubles, expect to burn twice the CPU, do twice the disk IO, etc.
There is often talk about algorithmic performance optimizations kicking in at higher load levels, in theory that sounds good. Sadly, most development projects are late, the pressure is high, and the good intentions of the programmers are often left on the drawing board. Once the application works, it ships, and the proposed optimizations are typically forgotten.
Performance Does Not Scale Linearly
Independent parallel processing is a wonderful thing. Imagine two service centers with their own resources doing their own thing and getting the work done. Now twice as much work is coming, so we add two more service centers. You’d expect the response time to stay the same and the throughput to double. Wouldn’t that be nice?
The problem is that at some point in any application all roads lead to a place where key data has to be protected from simultaneous updates or a key resource is shared. At some point more resources don’t help, and the throughput is limited. This is the bad news buried in Amdahl’s Law – which it something you should read more about.
When All Hell Breaks Loose Weird Things Happen
At very high transaction levels many applications can suffer algorithmic breakdown when utterly swamped with work. For example, a simple list of active transactions works well under normal load when there are usually less than ten things on the list, but becomes a performance nightmare once there are 100,000 active transactions on the list. The sort algorithm used to maintain the list was not designed to handle that load. That’s when you see the throughput curve turn down.
This can happen when a source of incoming transactions loses connectivity to you and, while disconnected, it buffers up transactions to be processed later. When the problem is fixed, the transaction source typically sends the delayed transactions at a relentless pace, and your system is swamped until it chews though that backlog. Plans need to be made to take into account these tsunami-like events.
For example, I’ve seen this at banks processing ATM transactions, where normally the overall load changes gradually throughout the day. Now a subsidiary loses communication and then reconnects after a few hours. That subsidiary typically dumps all the stored ATM transactions into the system as fast as the comm lines will move them. Building in some buffering, so that all the pending transactions can’t hit the system at once, can be a smart thing to do here.
Sometimes the tsunami-like load comes as part of disaster recovery, where all the load is suddenly sent to the remaining machine(s). Here your company needs to decide how much money they want to spend to make these rare events tolerable.
Oddly enough, there is a case where throughput can go up while response time is going down under heavy load. This happens when something is failing and it is never a good thing. How does this happen? It is often faster to fail, than it is to do all the work the transaction requires. It is much faster to say “I give up.” than it is to actually climb the mountain.
In the graph above, you see the system under heavy load. When throughput (transactions completed) suddenly increases while the average response time drops, start looking for problems. This is a cry for help. The users here are not happy with the results that are receiving.
For More Information:
There is more insights, hints, tricks, and truisms that I gathered over my 25+ year career in performance work in my book: The Every Computer Performance Book