Saturday, 21 April 2012

IT Problem Troubleshooting Methodology

In my many years of troubleshooting IT problems, and working with many many colleagues, I have seen the good and the bad of troubleshooting technique. This post is not going to go into any particular technology, it aims to set out an ideology for troubleshooting a failed system where there is no known fix.

Pre-requisites

- Technical aptitude.
- An understanding of the concepts behind how the failed system works.
- Sufficient access to the logs, management consoles, and diagnostic tools for the system in question.
- Curiosity. Be curious and have a good look around (time permitting) through the logs, management consoles, and diagnostics tools; something glaringly obvious might become apparent!
- Time. Time might be very scarce if the failed system is of critical importance, still, it is important to take the time to absorb the information that is available in-front of you, in order to come to a diagnosis.
- Have the grace to admit when you are beaten and escalate before too much time is lost if the system is mission critical!

Note: To be a good IT Technical Troubleshooter, you do not need to be a guru, you do not need to be super intelligent, you do not even need to have seen or have experience of the failed system before, you just need the pre-requisite technical aptitude, and be able to apply logic to the problem at hand.

Rules

- Do not make the problem worse!
- Before making any major change in order to fix the problem, be certain to first check with available sources of potentially useful knowledge (e.g. colleagues, Google and the internet, troubleshooting manuals, support channels, ….)

Troubleshooting the Failed System

Approach the problem in-hand with a clear mind and in the following logical way:

- Every system needs a certain chain of events to be satisfied in order to work; follow through the system's operational process checking off each link in the chain to find the broken link (of course one can to jump straight to the broken link when it becomes obvious what it is.)
E.g. For this to work, first this, this, this, this, …, and this must work!
*Many thanks to David Castillo Dominici of FreeDigitalPhotos.net for the image.

- It often happens that all the links in the chain check out okay, upon which there is still the option to allow the system to go through its operational process from the start of the chain (reboot.)

Tips and Things To Remember

- Do not jump to any assumptions or assertions!
- Do not make a rash judgement.
- “There is nothing in our lives that is permanent.” Just because something shouldn't have changed, doesn't mean it won't have changed.
- Do not expect to know everything; consult with sources of knowledge (colleagues, the web, manuals, …) and do not be afraid to ask for help. "To admit you don't know everything is the first step on the road to wisdon."
- Take the words of others with a pinch of salt – be sure to check things yourself.
- If you've done everything right, why not just try doing something a little different - who knows!
- Always be on the lookout for tools to make your life easier.

1 comment: