My Worst Day At Work

A key customer was having a performance problem and our company had assembled a team to fly down and work on it. I was part of that team and this was going to turn into my worst day at work – ever.

This customer had developed their application on our proprietary operating system and it worked great. However, we had recently shipped a UNIX OS on that same hardware and the customer had ported their application to it and it ran as slow as mud.
The way they saw it: Same hardware, same application, different OS, bad performance. This must be the vendor’s fault. The way we saw it: Applications ported to new operating systems often have performance problems in the same way a human from Earth would have to work extra hard to make it on a different planet.

So we arrived onsite and the initial one hour meeting with Mr. Big Cheese and his henchmen was deeply unpleasant and accusatory. Then we spent the rest of the day in a conference room logged in to the test system looking for some solution. At no time were we ever left alone. There was always at least one of his henchmen there asking us what we were doing and often unhelpfully commenting on our efforts: “We already tried that.”, “Anyone can see that this is not the problem.”, and “You’re wasting time.”

As the day drew to a close we had a final meeting with Mr. Big Cheese and he was not interested in what little performance-enhancing crumbs we had found. He used that hour to imply we were all idiots and demand that our company send down a real “UNIX kernel hacker”. Over and over, he made it clear that only a “UNIX kernel hacker” could solve this problem. We were not that, so he told us to go.

This company had made two key mistakes with us: One: They were jerks to the people who had come to help them. So we did what we were required to do, but not what we could have done if we wanted to call in favors, bend rules, and do heroic things. Be nice to the people who come to help you, even if they are idiots. Why? Because they go back to that company and advocate for you and spread the word that you are worthy of heroic, rule-bending efforts. Two: They never left us alone. We were never free to work as a group for fear that we’d say or do something stupid. If experts fly in, give them private space to call other experts at headquarters and talk amongst themselves. They will get to any possible solution much faster.

beerWe left at the end of the day as a group and walked to nearby restaurant. We ordered a round of beers. When they came, I picked up my glass and chugged the whole beer in a few seconds, something I’d not done since my college days.

Everyone was a little shocked. I set the glass down and slyly said: “It doesn’t do you any f#@king good in the glass.” That broke the ice and to this day we laugh, and laugh, and laugh about what a really horrible day that was and the fact that we are not, and never will be, “UNIX kernel hackers”.

Alcohol is not a solution to man’s problems, but laughter is.

Advertisements

Firefighting Addiction

For those who firefight recurring performance problems, there is tremendous satisfaction in their heroic efforts. There is an adrenalin rush and a freedom of action that stands in stark contrast to the boring calm of an ordinary day. They have to act quickly and decisively without the usual 500 meetings it takes to decide anything. It’s fun, addictive and sadly I have seen many examples of key staff members who have become so addicted to the rush that they do this all the time. It seems like hardly a day, or a peak, can go by without their personal intervention.

file000958149311

However, I’ve noticed something about this firefighting; when I examine their efforts, in most cases, they are really not having that much of an impact on measures like throughput or response time. One person I worked with sat at his desk for the first half hour of market open and, with the dexterity of a master organist, adjusted the priorities of processes. He thought it made a big difference; so did everyone else. He was a big wheel at that company, but in this case, he was completely wasting his time.

If someone at your shop has a bad case of firefighting addiction, it is tough to wean them away from it because you (with proper performance analysis and capacity planning) are taking away one of their most “valuable” contributions to the organization. Expect resistance.

Please do not get me wrong, I love those people who have what it takes to come up with just the right fix at a critical moment and have the courage to save the day. Those moments should be rare. If those moments happen daily, then what is really needed is some serious performance work to get at the root of the problem.

Being Ready For a Performance Emergency

Saving the day when performance unexpectedly suffers takes two things: the courage to act and preparation. The courage comes from within, but being prepared is easy; it can happen daily as part of your normal work. Here are some hints:

file000958149311

Have a phone list

The saddest thing is to watch someone wasting time looking up a phone number in a crisis. Create a phone list, and periodically call every number on it to make sure that the number is current and the person is still responsible for the thing you think they are.

Have a checklist

Make a list of things to do and check in a crisis. In the first rush of the problem you might not use it, but when you are stumped, it is a good thing to have.

Create fast analysis tools

Most of the time your meters are monitoring your computing world at a leisurely pace. When bad things happen, waiting for these meters is agony. Create a fast analysis tool that meters and reports in less than a minute.

Understand the differences in your tools

Many companies have a mix of performance tools. In a crisis, it is easy for two people looking at different tools to confuse the situation. Work to understand the differences in these tools during the quiet times. These tools may label things differently, use different low-level meters, sample at a different rate, or average over a different interval.

Get to know the local experts before you need them

Take time in your day to find and build relationships with the other experts in your company. Notice I did not say “meet with.” Ignore the bureaucracy and talk to them as people about how you can help each other. I’ve visited many companies, and the ones that handle problems the best are the ones where the key experts all know each other well.

Master tech support

Sometimes an expert can only be reached through tech support. The first time you ever call them should not be in a middle of a crisis. Once in a while call tech support with a question. A good first question is: “I’m preparing for a possible performance problem. Tell me about the basic meters/info you need?” Learn to collect those meters.

Every tech support department has ways to prioritize calls and protect their key wizards. If you know the system, you can get what you need quickly. Learn these ways by calling before the crisis.

Lastly, every tech support person I know has this advice for you. If you act like a jerk, you’ll get their least helpful service. If you are calm, clear, and prepared you’ll get their most helpful service.  If you say “Thank you” and, when appropriate, CC their boss you’ll get better service the next time you call.

More information on this, and many other useful ideas, can be found in my book:
The Every Computer Performance Book

A short, occasionally funny, book on how to solve and avoid application and/or computer performance problems