On debugging, v2

Dear new developer,

I wrote about debugging a few weeks back. I wanted to get more concrete. One time a friend called in about his client. The client was getting doubled orders on their ecommerce site. That is, someone would order five widgets on their site. The system would have some kind of hiccup and there would be two orders, resulting in ten widgets being shipped. It didn’t happen every time. It didn’t have any discerneable pattern. There were no obvious changes to the system that would cause this.

The customers weren’t happy about this. The client wasn’t happy about this. My friend wasn’t happy about this. He had looked around and couldn’t find the issue. He wanted me to take a look.

I had never worked with this ecommerce system before. There was no staging environment. I was debugging in the dark. I didn’t even have a way to submit fake orders that wouldn’t be charged. What I had was server access and SQL database access and a list of the customers that had been double charged.

So I started looking around at the log files (on the command line, with grep and vi and all the other great unix tools) and noticed that something weird had happened. The apache logs indicated that the server was restarted very often. The times when the server was restarted also lined up with the double charges.

I looked at the server cron file and found that someone had added a line that restarted the apache web server regularly. I asked if anyone knew why this was happening; no one did. They didn’t have their system changes under version control, so I was ginger about making changes. But finally I decided to disable the restarting of the server and see if the double orders continued.

They didn’t.

So this was definitely not a hugely complex system, but this is an example of debugging in a live system. Lessons for me:

  • Define the problem
  • Know the finish line
  • Start with what you know
  • Take small steps
  • Notice anything “weird”

Sincerely,

Dan

On debugging

Dear new developer,

Debugging systems is a key skill to have. Here are a few thoughts about it.

  • Try to get the problem to be as simple as possible. Start with the problem and keep isolating and removing pieces and see if the problem persists. Modern systems are complex and the less you have to think about, the better.
  • Keep notes about what you’ve tried. These can go into a chat system if the debugging is high priority (on a production system during an outage) or in a private text document if the debugging is not (a bug you’re trying to understand).
  • Think about recent changes to the system if the bug is new. This isn’t always the cause, but it’s often part of it. Rarely does a system just degrade spontaneously.
  • Write an automated test to illustrate the bug (if possible). This will
    • speed up your fix because it gives you a tight loop of run test, make change, run test, make change.
    • ensure that your change actually fixes the bug as opposed to something else.
    • provide a regression suite that will prevent this bug from popping up again.
  • Start at one end of the system or the other when you are trying to isolate the issue and see where it appears. For example, for a user facing web application, start with either the browser or the data store.
  • Minimize impact to users. If you are working on a production bug, ideally you can test on staging (right?) because this gives you the most latitude. However, if you do that, make sure that staging and production are exactly the same, otherwise you will chase your tail. Even if you have to test on production you can print debugging statements only for certain users (you or admin users) or to comments.
  • Start with a hypothesis and work to continuously disprove or prove it, refining it as you know more information.

Sincerely,

Dan

Learn to use a debugger

Dear new developer,

When you are fixing a bug in a program you are working on, a key thing to do is to get an understanding of the state of the system. This can include user input, stored values from a persistent data store, and non recurring information like the current time. But the most important piece of state is that of the program in memory. What function or procedure is executing when the bug appears, and what did all the variables look like at that moment?

Reproducing a problem with a test or sequence of steps is crucial for being able to solve it. You should take every step you can to make sure that your debugging environment is the same as the environment that the problem is appearing in. I remember one program I was debugging that worked fine in development, but failed miserably in production. It used Google Web Toolkit, which compiled java down to javascript. In development, even when I compiled it, the obfuscated variable names were different. That ended up being the issue–there was a variable name collision between the compiled javascript and another javascript that wasn’t namespaced correctly. I tore my hair out and was reduced to putting in console.log statements on production.

And that’s how a lot of debugging happens–printing out log statements to a file. You can solve many problems that way, it’s extremely portable and customizable, and it gives you some insight into program state.

However, a far better solution is to use a real debugger. They’ve been around since the 80s, at least, and give you far more insight into a program’s state than log statements. You can see the state of any variable. You can run commands interactively. You can stop anywhere, and restart the program. If you pair an interactive debugger with an automated test, you can have an extremely tight feedback loop that will help you zero in on the issue at hand.

Most of the major languages have such interactive debuggers (in fact, that’s one way to decide to avoid a language; a development language without a real debugger is likely to have other language level issues, like a poor dependency management story). Some languages even have standard protocols where you can connect to remote servers with a debugger. If you ever have to debug a production issue and can enable that, it’s going to be super helpful.

Debuggers are often integrated with an IDE, but some are runnable on the command line. Whatever your language, just google for “<language> debugger” and find out more about this valuable resource.

Sincerely,

Dan