The whole reason companies build applications is to handle the work with a reasonable response time. To do that well you need to monitor both internal and external response time for the transactions you care about.
There are no response time meters in the systems, or other technology, that your computing world is built out of because the builders of that equipment don’t know how you define a transaction. You are going to have to select the transactions you care about (eg: the SHOP transaction is 98% of your computing workload, the BUY transaction brings in 100% of the money) and find response time meters for yourself. To do that, you need to meter both internal and external response times.
Meter Internal Response Time
Usually, you only have control over some of the computers that the transactions you process flow though. In the example below, system B is your responsibility and you are very interested in how responsive it is.
Understanding the internal response time (when A gives you work, how rapidly do you deliver the response) helps you in three ways by giving you an alibi, an insight, and a head start.
- Alibi: When users are having response time problems, if you can show the response times are fine within your world, then the problem lies elsewhere.
- Insight: If you see response times increasing internally, but there is nothing very interesting showing up in any of the performance meters you currently collect, then you have an undiscovered problem and you need to do some more exploring.
- Head start: Sometimes response time problems take a while (hours, days, or even weeks) to be noticed and reported back to you or your boss. If you are paying attention to your internal response time meters, and the problem is located in your world, then you can be already working to find and fix the root cause of the problem when the boss knocks on your door.
Meter End User Response Time
It is also a good thing to test the response time as close to the user as possible. In this day and age that usually means testing outside your company across the Internet.
When testing across the Internet, the first thing you need to realize is that distance matters. If your users are spread out geographically, then those the farthest away will have the worse response times. The speed of light is not infinity, and the more distance you put between you and the customer the more delay they will experience.
You might reasonably point out that the few extra milliseconds don’t matter on a human scale, but you need to remember that, at many levels, there are back and forth conversations going on like so:
Can I have GrandCanyon.jpg? >—>
<—< Here is part #1, let me know when you get it.
Thanks, I got part #1. >—>
<—< Here is part #2, let me know when you get it.
Thanks, I got part #2. >—>
Each back and forth pays the price of geographically induced delays. There was an interesting experiment the ACM did in 2009 where they repeatedly copied four gigabytes of data across the Internet, and the only thing they varied was the distance between the source and the destination computers. Below are their results that clearly show distance matters.
In testing response time close to the user, you also need to pay attention to using last mile connections that resemble what the users are using. There is a big difference in throughput, network latency, and error rates among dial-up, DSL, satellite, cable modems, fiber optic, and of course mobile cellular connections. You should look at your current user base and test your response time using those networks.
A small change in the amount of data you are sending to the user can have a big impact on response time if their network has a restricted throughput. For example, a user with a 40kilobit/second dial-up connection will see an additional second of response time when the amount of data sent increases by just 5000 bytes. Depending on the current conditions, mobile connections can also have surprisingly bad throughput and high error rates, which further slows things down.
Sometimes the big size increase of what you are delivering to the users happens to make some internal group happy, and no one notices as they all download the bits over the wicked-fast, low-latency corporate net. Then the change is rolled out to the general pubic and suffering ensues.
The Internet is a network of networks that are owned by different companies that sometimes act like petulant little children who refuse to play nice with each other. Most of that trouble is intermittent, of short duration, and is completely outside of your control, but you do select a network when you choose your company’s ISP.
It can be a good thing to test response time using multiple ISPs from a given key geography. Imagine your customer service department starts getting complaints, but your internal meters all look good. Then you notice your response time tests that connect to the Internet via the Level3 network are all having troubles, but the tests connecting through ISP’s using network AT&T and MCI are all doing fine. Clearly, this is not your problem to fix, but it is wildly useful to know what is happening.
At this point you are probably thinking that all this testing sounds impossible to set up. Fear not, there are companies that have vast arrays of test machines all over the world and do this testing and provide detailed results to you as their business. When selecting a company to do this testing be sure that they can test:
- From where your users are located
- The types of last mile connections your users use
- The major ISPs your users use
- From inside your company (extra credit)
If you are thinking I can’t do all this, please stop worrying. Nobody does all of this. As Teddy Roosevelt once said: “Do what you can, with what you have, where you are.”
There is more to learn about performance work in this blog and in my book
The Every Computer Performance Book, which you can find on Amazon and iTunes.