Paul Robichaux’s recent Windows IT Pro column contains this passage:
If you don’t know when you’re about to run out of disk space on the log LUN, you’re totally doing it wrong. It leaves me slack-jawed with astonishment that in 2012 we still have administrators who suffer unplanned downtime due to log volumes filling up because of poor design or problems with backups. Stop the madness! Check your backups regularly and use monitoring, be it however primitive, to ensure that you don’t have this problem.
Paul is referring to this post on the MS Exchange Team blog, which mentions that:
…the number one reason why our Premier customers open Exchange 2010 critical situations is because Mailbox databases dismount due to running out of disk space on the transaction log LUN.
Either I missed that post when it was published a few months ago or I glossed over it and it didn’t catch my attention enough.
I’ve been stung by transaction log disks running out of space, even as recently as last year. I work in a very large Exchange environment where backups and storage monitoring are performed by two different teams, both of them outside of my team.
That puts us in the situation where failed backups, combined with either a storage monitoring problem or a human error in the escalation processes can (and has) caused a log volume to fill up. Another situation that can arise is when new servers are provisioned, our help desks begin putting mailboxes on them before they’ve been added to the backups.
Which is why I wrote this script to check the Exchange database backups and alert my team when they have not been backed up in the 48 hours or more. In fact that script is a better version than the one we run in production (I haven’t ported over the improvements I later made at home to the original).
Ever since we put it in place that script has alerted us to numerous situations before they could evolve into disasters.
Setting it up as a scheduled task to run daily is a job that will take you less than half an hour. I highly recommend it, not just because I wrote the script, but because it adds a valuable layer of protection to your environment.