Solving mystery bugs
September 10th, 2010 Jorge Balderas, Consultant (email the author)
As I was reading ‘The Girl With The Dragon Tattoo’ on my bus ride the other day, an unresolved computer defect (a.k.a bug) from work kept coming back to mind. I was constantly thinking on what I had tried already to diagnose this particular software defect and what I was going to try next. As I was jumping back and forth reading and thinking about this bug, the thought occurred to me that diagnosing and fixing a software defect is not a whole lot different than investigating a crime. Ok, I may be exaggerating, but I really think there are some commonalities in both situations. In the first installment of the Millennium Trilogy, the ‘Girl With The Dragon Tattoo’ Lisbeth Salander, a computer hacker joins journalist Michael Blomkvist to try to solve an unresolved missing person case. As a developer, I am often tasked to diagnose and fix software defects that were often created by someone else and in some cases have been in the application for ages. In this blog post I enumerate some similarities between diagnosing computer defects and investigating crimes. While solving defects is not life threatening (in most cases), there are certainly some commonalities.
Reporting the crime
It all starts by someone reporting an application error (i.e. the crime). This can be reported through a support ticket which may contain information about the error (e.g. a stack trace). In other cases it is reported directly by the user, or if the application has not been released it is reported by a QA (Quality Assurance) tester or by another developer.
Interrogating witnesses
If the reported error is not descriptive enough, my next stop can be the witness (e.g. the user reporting the defect). The user may be able to produce more detail about how the error occurred, e.g. which actions were taken before the error. The easiest scenario is when the error can be reproduced consistently.
Collecting and analyzing evidence
If the application error is not descriptive enough or the user did not provide detailed information, one of the next things to do is to collect additional evidence. In the case of software applications, my next stop is the application logs (when they are available). A good verbose application log can tell us what was going on before the error took place. It can sometimes show the values that were being passed throughout the application layers. Were there required values missing? Were the values of the incorrect expected type? These are pieces of information that can close in the root cause of the error and what needs to be fixed.
Lack of evidence
In many cases we do not have the luxury of a good verbose application log. This is often the case when the error occurred on a production system or if the application lacks good logging. This is where the case gets more interesting. If we have a stack trace available, we can look for the line of code that failed on the source code and this sometimes can tell us what triggered the error (e.g. a null parameter was passed). Another option is to try to reproduce this error in development or QA environments and attach a debugger if that capability is available. A third option is to enable logging or add additional logging statements to help pinpoint the issue.
Discarding suspects
Time is of the essence when solving police investigations, as it is when solving computer defects. Most defects are found when the application is going through Quality Assurance, the last stage before going live. Fixing defects has to occur at a fast pace or else the system will not be released on time. An experienced developer can quickly solve simple bugs by looking at the stack trace. Defects should not be assigned to junior developers that may have not even worked on the application and may need to spend hours debugging an application. A good developer should be able to rapidly discard parts of the application that are not related to the bug. If debugging is required to diagnose the problem, a developer familiar with the code will place breakpoints right before where the error is occurring as opposed to stepping throughout the application code which can be very time consuming.
Dealing with misleading evidence
There are certain bugs that cannot be easily reproduced. For such bugs having a recollection of the steps followed before the system error is extremely important. A user or tester can save a lot of time by providing relevant information (e.g. screenshots) to reproduce the error. On the other hand, the user can also send you on the wrong track by unintentionally giving you incorrect or incomplete information.
Application logs can also be misleading evidence. The application can be logging inaccurate information, e.g. logging the incorrect action or logging the incorrect variable value and effectively derailing the investigation.
Recreating the crime scene
In computer software we frequently have the capability of reproducing the defect on test environments. More than often we can replicate the issue with the same set of data and we also have capabilities to debug and step through the code at runtime. This is a luxury not available in police investigations.
You are guilty, unless proven otherwise
There have been hundreds on instances in which I become desperate trying to figure out why a piece of code is not working the way I expected it to work. In 99% of those instances, the computer was always right and it was doing exactly what I told it to. It was my instructions (i.e. my code) that were incorrect. Over and over again I was proven guilty. Of course, there always are those rare instances in which I have found legitimate bugs in the underlying frameworks that I was using, but from experience those instances are the exception.
While I do not claim that software skills can be transferable to police investigations, there are certain bugs that after having solved them, made me feel like an experienced police detective.
Entry Filed under: Agile and Development
Pages
Categories
- Agile and Development
- Application Modernization
- Cloud Applications
- Process Integration
- Summa
- Technology + Healthcare
- Uncategorized
Most Recent Posts
- Summa Is Award Finalist at IBM’s Impact 2012
- Working with JqGrid and ASP.NET MVC - Setting up a base jqgrid parameters class
- Rebase a Slave Mercurial Repo to a Subversion Master
- The Social Enterprise Part 2 – How To Set Up Chatter In Less Than 30 Minutes
- Implement Clear Governance for BRMS
Feeds
Calendar
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Jul | Oct » | |||||
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | |||

4 Comments Add your own
1. Chris Winters | September 10th, 2010 at 9:48 am
There’s a great book by Ellen Ullman called “The Bug” about a particularly insidious bug related (IIRC) to display systems and mouse inputs in the relatively early days of mice. It’s fiction, but she does a great job of describing how bugs get in your head and how simple the cause frequently turns out to be.
2. Jorge Balderas | September 10th, 2010 at 10:08 am
Thanks Chris, I’ll have to check it out!
3. Mitch Goldstein | September 10th, 2010 at 10:37 am
Great article! Agatha Christie would be proud!
4. Jorge Balderas | September 10th, 2010 at 3:34 pm
Thanks Mitch! I guess we all have a bit of Hercules Poirot within us =)
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed