There’s an old saying … “S**t happens”. That’s true in all areas of life, but it seems to be very true in the IT world. Maybe it’s because we rely on our PCs for so much of our lives these days.
A good rule of thumb is to plan for likely breakdowns right from the start. Here’s a good example: planning for a server breakdown.
Looking at a server, what is most likely to break down? Like most computers, the most likely things to break are those that have mechanical parts in them – things like the power supply or hard drives.
An example of a good disaster recovery “rule of thumb” might be to order an extra hard drive at the time you originally order a RAID-based server. This might not seem to be common sense, but hear me out. Although hard drives have consistently gone down in price, those regular changes lead to drives with different mechanical and physical specs. Replacing a drive in a RAID with one that has different specs just asks for trouble.
In the above situation, if a drive in your RAID fails, you can immediately open the replacement drive and initiate a re-build of your RAID array. If you did not have such a drive on-hand, you would have to a) frantically look around for a drive with the proper specs, b) order the drive with priority (I.e. expensive) shipping, c) wait for the delivery, and d) then re-build your drive.
As an “experienced” (I.e. smart) IT manager, it makes sense to plan for a server breakdown and have a replacement part on the shelf. Less “down time” means a faster availability of the server and a smoother-running IT function.