In this tutorial I will demonstrate a recovery scenario for a failed Mailbox server that is a member of an Exchange 2010 Database Availability Group. In this scenario the DAG has two members, EX1 and EX2. EX2 has suffered a serious hardware failure and needs to be recovered.
With server EX2 down the each mailbox database in the DAG has switched over to EX1 and shows the following status information.
[PS] C:\>Get-MailboxDatabaseCopyStatus -Identity "Mailbox Database 01" Name Status CopyQueue Length ---- ------ --------- Mailbox Database 01EX1 Mounted 0 Mailbox Database 01EX2 ServiceDown 0
The Exchange recovery process begins by reinstalling Windows Server 2008 R2 on the new server.
Because this Exchange recovery is for a member of an Exchange 2010 DAG the server must be installed with the Enterprise edition of Windows Server 2008 R2.
After Windows Server 2008 R2 is finished installing log on to the server and complete the following tasks:
- Configure the Timezone settings
- Configure the Automatic Update settings
- Configure the server with the same TCP/IP configuration as the previous server
- Configure the server with the same name as the previous server (in this case EX2)
- Join the server to the Active Directory domain
The next step is to install the Exchange 2010 pre-requisites for the Mailbox server role. From an elevated PowerShell prompt run the following commands.
Import-Module ServerManager Add-WindowsFeature NET-Framework,RSAT-ADDS,Web-Server,Web-Basic-Auth,Web-Windows-Auth,Web-Metabase,Web-Net-Ext,Web-Lgcy-Mgmt-Console,WAS-Process-Model,RSAT-Web-Server -Restart
After the server has restarted we also need to install the Exchange Server 2010 SP1 hotfixes for Windows Server 2008 R2. These updates require another restart of the server.
Before installing Exchange Server 2010 on the server being recovered we first need to remove it from the DAG. On another Exchange 2010 server open the Exchange Management Shell and run the following commands.
First, determine which mailbox databases the server was hosting a copy of, the activation preferences, and any replay lag that was configured. In this example server EX2 hosted copies of Mailbox Database 01 and Mailbox Database 02.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag* Name : Mailbox Database 02 Servers : {EX2, EX1} ActivationPreference : {[EX2, 1], [EX1, 2]} ReplayLagTimes : {[EX2, 00:00:00], [EX1, 00:00:00]} TruncationLagTimes : {[EX2, 00:00:00], [EX1, 00:00:00]} Name : Mailbox Database 01 Servers : {EX1, EX2} ActivationPreference : {[EX1, 1], [EX2, 2]} ReplayLagTimes : {[EX1, 00:00:00], [EX2, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00], [EX2, 00:00:00]} Name : Archive Mailboxes Servers : {EX1} ActivationPreference : {[EX1, 1]} ReplayLagTimes : {[EX1, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00]}
Next, remove the failed server from each of the mailbox databases that it held a copy of.
[PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 01\EX2" [PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 02\EX2"
Warnings will appear because the failed Exchange server EX2 can’t be communicated with, however the change can be confirmed by repeating the earlier command.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag* Name : Mailbox Database 02 Servers : {EX1} ActivationPreference : {[EX1, 1]} ReplayLagTimes : {[EX1, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00]} Name : Mailbox Database 01 Servers : {EX1} ActivationPreference : {[EX1, 1]} ReplayLagTimes : {[EX1, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00]} Name : Archive Mailboxes Servers : {EX1} ActivationPreference : {[EX1, 1]} ReplayLagTimes : {[EX1, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00]}
Next, remove the failed server from the Database Availability Group. Run the following command in the Exchange Management Shell.
[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2
Note: in some DAG topologies this action will fail with an error “A quorum of cluster nodes was not present to form a cluster”. If that error occurs use the solution in this article – Unable to Remove Failed Server from DAG Membership in Exchange Server 2010
When you are ready to proceed with the Exchange 2010 install open a command prompt and run the following command from the directory that has the Exchange setup files located within.
setup /m:recoverserver
When setup has complete and the server has been rebooted, add the recovered server back in to the Database Availability Group.
[PS] C:\>Add-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2
Then, taking note of any replay or truncation lag times, and activation preferences that were earlier identified, re-add the mailbox database copies to the recovered server. This process can take a long time depending on the size of the mailbox databases that need to be reseeded.
[PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 01" -MailboxServer EX2 [PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 02" -MailboxServer EX2 -ActivationPreference 1
You can now verify that the databases have the same settings that were identified earlier.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag* Name : Mailbox Database 02 Servers : {EX2, EX1} ActivationPreference : {[EX2, 1], [EX1, 2]} ReplayLagTimes : {[EX2, 00:00:00], [EX1, 00:00:00]} TruncationLagTimes : {[EX2, 00:00:00], [EX1, 00:00:00]} Name : Mailbox Database 01 Servers : {EX1, EX2} ActivationPreference : {[EX1, 1], [EX2, 2]} ReplayLagTimes : {[EX1, 00:00:00], [EX2, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00], [EX2, 00:00:00]} Name : Archive Mailboxes Servers : {EX1} ActivationPreference : {[EX1, 1]} ReplayLagTimes : {[EX1, 00:00:00]} TruncationLagTimes : {[EX1, 00:00:00]}
The failed DAG member has now been recovered and the Exchange 2010 Database Availability Group is back to normal operation.
[adrotate banner=”49″]
Hi Paul,
Great article:)
Question for you please. You say build the server a fresh (bare metal) and give it the same name and joint it to the domain but if you do this your going to blow away all the AD information for the server and hence can’t then use recover switch.
I though the process here was to build a fresh, reset computer account, joint it to the domain, install pre-req, install Exchange using recover switch and then deal with the DB’s.
That said, i have a four node DAG with one server down, lost due to blue screen of death. My plan was to remove db copies, remove server from DAG, then build a new server, get Exchange installed as per other nodes, joint it do DAG and then reseed the DB’s.
Brilliant document, this was forwarded to me from someone else as one of our servers had been evicted from the cluster manually. Did exactly as the article described and am now replicating the databases back to the new server. Longest part was the windows updates, a necessary evil but damn, those blasted updates !
Hi Paul,
First of all I would like to thank you for your wonderful articles. Unfortunately I am in mess of Exchange Server 2010 since last few days as one of my mailbox server is dead.The server was part of DAG and I have done the cleanup of that mailbox server with the help of your this article but after that Instead of recovering I have build a new mailbox server with new name and IPs. The problem I am facing that new server is unable to join the DAG any idea why is that so.
Hi
I’m receiving this error always when i try to configure the DAG , I tried Uninstalling the EXchange , Tried different Servers , no luck .
A Serer-Side Database availabitilty group administrative operation failed. Error the operation is failed. CreatCluster errors may result from incorrectly configured static address. An error occurred while attempting a cluster operation. Error: A security package specific error occurred”. [Server:Ex-01]
Hi Paul,
Need some further assistance from you.. Since we have to procure new storage and it will take two or three weeks, currently storage of my current active dag is just bleeding out very fast I have delete some logs but its not enough and I have very less space left. I will be grateful if you can guide me the fix for this situation. Below are the steps that I already have taken but its not serving the purpose.
>Deleted old database logs.
>Deleted unnecessary mailboxes to have some white space.
>Archive the mail boxes with bigger volumes to have white space.
Hi Paul, its really an informative post from you. I am going through Exchange 2010 disaster but my scenario is little different from the above scenarios. I have two DAG servers and one of my DAG server’s database volume has gone as it was on SAN storage and due to disaster on SAN we are also unable to recover it. The server is there but the volume containing database has gone. Now please suggest from below two options for what option I should go for or suggest if you think there is some better option.
1) Do I need to go through the whole above process of rebuilding the failed DAG again.
2) To work on re provisioning a new volume and start making the databases on it again. (If yes then please let me know the steps)
The Real Person!
The Real Person!
You only need to replace the failed storage volume, and then reseed the database copy to that DAG member again. No need to rebuild the DAG.
Thank you Paul for the advice and your quick answer much appreciated.
I have 2 srv in one dag
One failed and tried to recover but I can’t …. i saw the databases aren’t mounted …tried to mount but still not possible
… one db ls service down and the other db in failed copy states
So how do remove the srv including the db from the server
…
Hi Paul,
I had to demonstrate Disaster recovery of Exchange 2010 in my company in cold disaster site. I have performed Point-In-time recovery to recover all exchange servers one HUB/CAS server and two Mailbox Server in DAG. All went fine I was able to access my blank mailbox. I have faced couple of issues which is not mentioned in above steps or may be it is not required if you are recovering one DAG member. Please correct me if I am wrong.
1. After recovering server from scratch you are missing permission on your Admin account/service account on local server. You have to add your admin account/service account and other Exchange groups to appropriate local group before you go further in re-configuring your exchange server.for example Exchange Server, Exchange Server Services groups were missing from local groups.
2. If you have installed your exchange server in customized folder like on D: drive instead of using native installation folder, this recovery will not select customized folder in this case you will have mismatch in standard if you are following any in your company.
3. You have to have all drives added to new server what was there on your old mailbox server before you run setup /m:recoverserver switch otherwise setup will fail.
I have faced all these issues in my recovery procedure, please let me know if all these are practical issue one can face during recovery?
Also let me know if this is the right solution for cold disaster site solution?
Hi Paul,
Great article.
Question….When left with one surviving DAG member, how would you remove this server from DAG safely and ensure all databases get mounted on it so that it becomes a stand alone mailbox server?
The Real Person!
The Real Person!
If you are down to just one DAG member and want to decom the DAG itself you basically just need to
1) Remove the server from the DAG
2) Remove the DAG itself
Michel writes about it in more detail here:
http://eightwone.com/2013/02/05/decommissioning-exchange-2010-dag-to-single-servers/
Hi Paul:
Maybe this is an obvious question, but there is something that is not enough clear for me. I’m recovering a failed DAG member, I did exactly what the article says, with only a little difference, I still have the old files in the DB and log volumes. I’m facing some problems adding the copy of the databases in the recovered server, so, my question is: Do I need to delete the old files, I just moved them to another volume in the same server)? … Sorry for my english, I’m still learning 🙂
The Real Person!
The Real Person!
Yes you need to re-add the database copies to the recovered server. As far as the DAG is concerned those copies were removed when that DAG member was removed. The existing files on your volumes will cause it to fail to add the new copy, and should be removed/moved out of the way first.
Hi Paul,
My server is with SP1. When we have to install SP1 on servers? I am recovering two DAG members only with Mailbox role. I have separate HUB/CAS server configured.
Thanks,
The Real Person!
The Real Person!
I don’t understand your question.
Hi all
This is an excellent article, thanks to Paul.
But there are some unclear steps for me, because we have a diffent dessign:
– All servers are Exchange 2010
– 3 Mailboxservers (all members of one DAG): installed on VmWare ESX
– 2 servers with the CAS and HUB trasport roll on it (both are members of one Cas arrey): installed on VmWare ESX
So as you can see, all our Exchange servers are installed on VmWare ESX. And this is it why the recovery of a DAG member would be different.
I know that MS disadvises DAG on VmWare. But it is our dessign now, which I am not able to change for now.
The firest steps for recovery are logic.
– …
– Remove the failed server from each DB.
– Remove the fialed server from the DAG.
Is there a practicable way to process a recover from a VmWare snapshot or do I have to rebuild the whole server first?
The snapshot recovery is easy but then how to remove the DB copies from that recovered server completley?
Any answer is highly appreciate. Sine I could not find any relayable ansers on this in the Internet.
Kind regards
The Real Person!
The Real Person!
Snapshots are not supported. You should not take a snapshot or recover from snapshot for Exchange servers.
Whether your servers are virtualized or not makes no difference to the recovery process for a completely failed DAG member except that you could deploy the new VM from template rather than manually reinstall the OS I guess.
thanks a lot … one of my DAG members suddenly failed … got to recover it and this article was a lifesaver.
BTW Keep up with the good work …this site rocks
Great article, really helped me out, had to reinstall our broken DAG memeber.
It’s really an good article ,thanks a lot …….because it helped me a lot while at the time of recovering the failed server in dag ….. Keep posting
Thanks for this article as it saved me lots of work.