Wednesday, February 16, 2011

Troubleshooting an application

Normally I would not openly criticize someone else's programming choices. However, what I experienced today was not something I expected to ever have to do. You see, there is this application that ran faster on a multi-core server than on a 500MHZ workstation. When run on the workstation, the cpu usage pegged as high as it could go. Folks were more willing to discuss the various performance characteristics of machines running this application rather than dig into this code and fix it.

A quick code survey revealed the developer-programmer had coded no logging statements, I mean none, zero, nada. There were maybe 12 or so source files that went into main() so the magnitude of this challenge was not that significant. I had code surveyed this in the past and pretty much each file had just a few functions each.

Approximately, 3 hours of adding many printf statements, compiling, running, adding more printf statements, compiling, running, the bug was found. Seems there was some code that allowed an overrun, placing the code in a loop (thus the pegging of the cpu). This loop needed to go all that way to 2,147,483,647 before exiting. No wonder the multi-core server made quick work of this. However for the poor 'ol 500MHZ workstation that is a lot of clock cycles. No wonder it took so long. Poor thing.

A few lines of code went into this to check the boundries, specifically "if less that 1 or greater that 12", then "continue". This so called fix, allows the application not to run all the way to 2,147,483,647 before exiting. I have no clue what possessed the developer-programmer to take that approach when parsing that particular file.

This was a fairly aggressive approach to fixing this problem. But in these cases a person really needs to be aggressive, maintain focus, stay with it and code in as many printf's and needed, run the compile and test iterations and keep moving on until finding the offending computational operation.

No comments:

Post a Comment