I’ve just uploaded v1.04 of the Test-ExchangeServerHealth.ps1 script.
A quick note on versions – if you have v1.2 or v1.3 now then this version is *newer*. For a variety of reasons I have had to adjust the version numbering for these scripts.
Updates in version 1.04:
- Added Exchange 2013 compatibility
- Added option to output a log file
- Converted many sections of code to use pre-defined strings
- Fixed -AlertsOnly parameter
- Improved summary sections of report to be more readable and include DAG summary
Here is an example of the improved health summary section in the report.
Please send me the script
Thank you
How to Schedule this report in a task scheduler.
Also it would be great, if you include Disk space information of exchange servers and also notify status of Successful Last Backup completed for Database
Paul thanks a bunch for your response!! I delved a bit deeper into the workings of the code a bit before your response, and figured out that is what we needed to do for our environment. Excellent script, the whole engineering team at my firm loves it! Keep up the good work.
Glad you got it working. Thanks for the feedback.
Paul or others, I have one question for you:
We want to incorporate a measure to check for quarantined/isolated mailboxes in the health script. I’m just not enough of a guru to know where and how to place that so that it doesn’t completely break the monitors. Do you have any ideas as to how to do such add ons? Thank you very much in advance.
Is there a way to modify the values where the entreis in the DAG Health Summary are set:
healthy copy/replay queue count is 2 (of 4).
We would like to change these values if possible as we need them larger and the report doesn’t generate a warning. Is this a setting within the script, or is this within Exchange somewhere. Any thoughts would be very much appreciated. Thanks.
Yes, look at [int]$replqueuewarning in the variables section of the script (near the top).
Pingback: Netrix LLC – Exchange 2013 PowerShell Scripts β A Practical Guide β Part 2
Pingback: Exchange 2013 PowerShell Scripts – A Practical Guide – Part 2 | Just A UC Guy
Pingback: Exchange 2013 PowerShell Scripts – A Practical Guide | Just A UC Guy
Hi Paul,
Can you plesae provide the script for Exchange 2007 which can give output DB mount, Queue Length, services status. Free space etc. Also I have HUB/CAB combo role.
Thanks in advance
Thanks for this wonderful script, Paul! It saves me so many headaches to get a daily verification that my servers are up and healthy. I wanted to make a feature request, though. I have noticed that if I put one of my DAG member servers into maintenance mode, the script continues to report that everything is shiny.
It appears that although the script reports an error if the cluster service is not running, it doesn’t check whether a cluster node is paused or a DAG member is activation blocked unexpectedly. Also, it appears that the DB Copy Suspended column reports “Passed” even when a database copy is activation suspended on the server. This is apparently because that’s what the Test-ReplicationHealth cmdlet returns, but I don’t understand why it does that.
While technically none of these indicates an unhealthy server, it would still be helpful to know that the DAG is not fully available. In my case, one of my three servers is always activation blocked, so anytime one of the other two is in maintenance mode, I risk an outage if the third one fails.
Finally, I recently scheduled the script to run using a service account rather than my admin credentials and had to do some detective work to get it to run with minimum permissions. Please let me know if I’m mistaken, but I wanted to share the info with anyone else who’s trying to do the same:
* Local Administrator on the server running the script
* Member of “Exchange View-Only Administrators” domain security group
* To run the uptime check, the account must both be a member of the “Distributed COM Users” local security group and be granted “Remote Enable” permission in wmimgmt.msc for the root namespace and subnamespaces on all remote exchange servers.
Andrew
I think that is a valid concern and probably something the script could flag. I’ll see what I can do.
Good work on the minimum privileges, you’ve worked out a lower privilege set than what I use right now. Handy to know.
Maybe try ; instead of ,
We started to use a DL for recieving these mails and not personal of seperate emails… Works like a charm π
Gluck
A DL is a good way to do it. Alternatively you can put multiple addresses in, they just need to be enclosed in quotes and separated by commas.
Eg,
“address1@domain.com”,”address2@domain.com”
But really a DL is much easier to manage.
yes. email group is better
Hi Paul, just noticed a weird thing on the script. When I add multiple recipients, the script is emailed only to the first one. My syntax is “xyz@domain.com, abc@domain.com, 123@domain.com“. All users belong to the same domain but only xyz@domain.com receives it. If I switch around the email addresses, then again, only the first one gets the email. Any ideas?
Thanks
Tashfin
please try to add email group which is better way for you.
Hi Paul,
Very nice script and it helped me to check the exchange servers status daily morning.
I have scheduled the script in a windows server 2008 where management tools installed and it is running fine.
Now i am planing to decommission the server so i have scheduled the same script in one of the Exchange 2010 CAS server.
i am getting accurate results for all other exchange servers but for the server from where the task scheduled ,report giving β Client Access Server Role required services not running.
I have tested the script by manually running it and no errors reported. But if i run the schedule task then its reporting getting services error for that specific exchange server from where i ran the schedule.
I donβt have any clue and i am not good enough in scripting.
Could you please help me…
Hi
I downloaded your script and its great. But I have this massive issue. I was running out of disk space on one of the exchange drives that had 2 DB’s on it (don’t ask why, I found it this way) I than moved users from DB3 to DB1. DB1 is configured to a DAG and DB3 is not. total amount of users moved was 479. After a day everything moved fine, no errors reported what so ever. I than waited an additional 2 days to ensure all went well. I ran another script that shows the amount of user mailboxes per DB and it showed DB3 was empty so I deleted DB3. as soon as I did that ive been having severe issues with my mail queues. all mails bound for users in DB1 seem to get stuck in the “unreachable domain” queue with “Last error: The mailbox recipient does not have a mailbox database” im completely lost. all was perfect till I deleted the empty DB DB#. I really need help please. your health script shows all is well.
We are running Exchange 2010 SP1 on windows 2008 R2 servers.
Any help would be great. Im no exchange guru so please im desperate. heeeeeeeeeeeelp!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
What were the steps you used to delete the empty database?
Did you know that when you reach an uptime above 1000 hours, it is reporting 0 in the health report. That was a little scary when I came back from anual leave. Good to see that the monitoring was able to tell me that is was over 50 days π
Yep, more than a few people have reported that to me. My servers report uptimes of 1000+ hours from time to time but without showing the bug people are reporting to me. It is still on my list to fix.
Hi Paul –
I’ve been testing out this excellent script in my 2010 environment. I’ve got 2 mailbox servers (out of 8) where Test-Mailflow consistently chooses a SystemMailbox from a recovery database and fails with “Unable to open message store.” There are plenty of other active databases on these servers. Any clue how Test-Mailflow decides which database to use for the default “source” of this test? Test-Mailflow works fine on the other 6 mailbox servers – which also house a recovery database.
Not sure how it picks the system mailbox. Yeah I don’t normally have recovery databases sitting there permanently. I’ll see what I can do to work around that.
Hi Paul –
Based on my Exchange 2010 environment, it seems to choose the “first” database as listed when you pull an (unsorted) list of “active” databases on a server – which you do with this statement:
[array]$activedbs = @(Get-MailboxDatabase -server $server -status | Where {$_.MountedOnServer -eq ($serverinfo.fqdn)})
My workaround for the Test-ExchangeServerHealth script was to add a condition to the IF statement at the top of the “START – Mail Flow Test” section (which determines when to skip the mail flow test) as follows:
From this: if ($version -eq “Exchange 2013”)
to this: if ($version -eq “Exchange 2013” -or $activedbs[0].Recovery)
This effectively skips the mail flow test if the first database in the $activedbs array is a Recovery database.
This also covers the case where a Recovery database is the ONLY active database on the server (which I also have).
I’ll eventually get around to removing and recreating the recovery databases that are getting in the way to see if that helps but, I figured I’d let you know what I found. In my case, a “skipped” Mail Flow test is better than an “always fail” Mail Flow test.
Hi Paul,
Any news on the Exchange 2013 mail flow skip issue?
Thanks
I have some code that I believe fixes it I just need a bit more time to test it. Maybe a week or two before I release it, hopefully.
Interesting.
When using the -Verbose switch, the script just seems to hang when testing the mailbox server role services on the first MBX which is still live and active.
However, it only “seems” to be hanging, because in reality, it isn’t. The script is actually progressing. It’s just taking an eternity to test each MBX node with the third DAG member offline.
Is there any way around this, Paul?
When I shut down one of my DAG members the script takes longer to run. It looks to me like the downed server causes some tests to fail, which means they have to wait for their timeout period to lapse before they report the failure. I would say that is then amplified by the size and complexity of your environment.
Thanks for the reply, Paul.
That’s not the case with this environment. We run HUB/CAS as dual role nodes and the MBX nodes are completely separate.
We’re running the script simultaneously off two separate CAS to prevent missing alerts in the event that one is down for any reason, and the alerts are not being sent from either CAS while this node is out offline.
That’s what’s got us scratching our heads.
Kerry
Run the script in a shell window so you can watch it. Use the -Verbose and -Log options to see more information as the script is progressing. The downed server is possibly causing one of the tests to fail or hang in a way that the script is not designed to handle.
G’day Paul,
We’ve had to temporarily shut down a DAG member MBX node and we’ve noticed something unusual with the health script’s behaviour.
The health script still runs, but no email reports or alerts are sent from the Scheduled Task, whatsoever.
I’ve added the offline node to the ignore list but it doesn’t seem to be making any difference.
Could you give me some insight on what we might need to do to restore the email reports for the duration of the node’s outage?
No settings have been changed in the Scheduled Task or in the script config .. only the DAG node is down.
Any ideas?
My guess would be that the server you shut down is also the Hub Transport that you’ve been using as the SMTP server when the script sends the email.
Works like a charm, thanks for the update, much cleaner report.