If there’s one thing that I’ve learned in my years of working in IT, it is that things will not stay in working order forever. Murphy’s Law says that things will break down when you most need them to work.
So what do you do?
Accept that things will go wrong … and plan how you will fix them when they do. This is the basis of disaster recovery. It is also the basis of a smoothly-running IT function.
In the long run, an IT Manager is judged by how smoothly things run. There’s nothing special about supporting hardware or software when it is running perfectly. However, when things go wrong, having a plan already in place to fix a problem saves a lot of time and worry. Saving time in recoveries makes things run smoother overall.
An important part of your job is to envision what could go wrong (both likely and unlikely events) and set up plans for how the IT function can recover from such disasters. Some “disasters” occur often and/or are easy fixes. (E.g. What would you do if a data file is lost?) Some “disasters” are very unlikely, but take a lot more planning. (E.g. What would you do to restore the IT function if the building burnt to the ground?)
I’ll talk more about disaster recovery over time.