If you’re running an Exchange 2016 database availability group, and one of the DAG members fails, you can recover the DAG member to restore the high availability of your Exchange mailbox databases. Providing that your DAG is healthy and configured correctly, your remaining DAG member(s) should be able to maintain service availability while you perform the recovery.
Recovering a failed DAG member makes use of the Exchange recovery installation method, which reinstalls Exchange onto a server of the same name and pulls configuration information from Active Directory. However there are some additional steps required before and after you perform the recovery install.
For this demonstration scenario I have a two-member Exchange 2016 DAG named EX2016DAG01 with the following members:
- EX2016SRV1 has failed
- EX2016SRV2 remains healthy
Removing the Failed Member from the Database Availability Group
The failed member needs to be removed from the DAG configuration by running the following commands. First, the database copies on the failed DAG member are removed. They should have a status of ServiceDown, and you can remove them with the Remove-MailboxDatabaseCopy cmdlet.
[PS] C:\>Get-MailboxDatabaseCopyStatus -Server EX2016SRV1 Name Status CopyQueueLength ReplayQueueLength LastInspectedLogTime ContentIndexState ---- ------ --------------- ----------------- -------------------- ----------------- DB05\EX2016SRV1 ServiceDown 0 0 Unknown DB06\EX2016SRV1 ServiceDown 0 0 Unknown DB07\EX2016SRV1 ServiceDown 0 0 Unknown DB08\EX2016SRV1 ServiceDown 0 0 Unknown [PS] C:\>Get-MailboxDatabaseCopyStatus -Server EX2016SRV1 | Remove-MailboxDatabaseCopy -Confirm:$false
Next, remove the failed server from DAG membership using the Remove-DatabaseAvailabilityGroupServer cmdlet. The -ConfigurationOnly switch is used to make the change in Active Directory without needing to communicate with the failed server.
[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity EX2016DAG01 -MailboxServer EX2016SRV1 -ConfigurationOnly
The failed DAG member also needs to be manually evicted from the underlying Windows Failover Cluster.
[PS] C:\>Get-ClusterNode EX2016SRV1 | Remove-ClusterNode Remove-ClusterNode Are you sure you want to evict node EX2016SRV1? [Y] Yes [N] No [S] Suspend [?] Help (default is "Y"): y
Removing EdgeSync Credentials
If the AD site has an Edge Transport server subscribed, you’ll need to remove the EdgeSync credentials from the Exchange server using ADSIEdit. If you don’t complete this step, Exchange setup will fail with the following error:
The internal transport certificate for the local server was damaged or missing in Active Directory. The problem has been fixed. However, if you have existing Edge Subscriptions,
you must subscribe all Edge Transport servers again by using the New-EdgeSubscription cmdlet in the Shell.
To remove the EdgeSync credentials, open ADSIEdit and connect to the well known naming context of “Configuration”. Browse to Services -> Microsoft Exchange -> Your Org Name -> Administrative Groups -> Admin Group Name -> Servers. Right-click the server object and select Properties. Find the msExchEdgeSyncCredentials attribute in the list, and edit it to remove all entries.
Preparing the New Server for Exchange Recovery
Replace or rebuild the failed server with a new installation of Windows Server, using the same computer name as the failed server, and join it to the Active Directory domain. You should configure the server to match the failed server and your other DAG members in terms of networking and storage. Once you have the server ready, you can check which build of Exchange 2016 to install by running Get-ExchangeServer from a healthy Exchange server and noting the build number.
[PS] C:\>Get-ExchangeServer | Select Name,AdminDisplayVersion Name AdminDisplayVersion ---- ------------------- EX2013SRV1 Version 15.0 (Build 1210.3) EX2010SRV1 Version 14.3 (Build 123.4) EX2016SRV1 Version 15.1 (Build 396.30) EX2016SRV2 Version 15.1 (Build 396.30) EX2016EDGE Version 15.1 (Build 225.42)
The failed server in this demo, EX2016SRV1, was running Exchange 2016 Cumulative Update 1 (you can check build numbers here).
Performing a Recovery Install of Exchange 2016
Open a CMD prompt, navigate to the folder where you’ve mounted the Exchange 2016 ISO or extracted the setup files, and run setup with the following parameters.
G:\>setup /m:recoverserver /iacceptexchangeserverlicenseterms
Wait for setup to complete, then restart the server.
Add the Recovered Server as a DAG Member
After restarting the server you can add it back to the database availability group.
[PS] C:\>Add-DatabaseAvailabilityGroupServer -Identity EX2016DAG01 -MailboxServer EX2016SRV1
After adding the DAG member, you can add the mailbox database copies as well.
Since every environment is different, here’s a few additional steps you might need to look at as well:
- Export/import the SSL certificate from another server to the recovered server
- Verify, and if necessary re-apply, the client access namespaces on the virtual directories
- Recreate the Edge Subscription
- Reinstall antivirus, backup, monitoring agents
- Rebalance activation preferences
- Run Exchange Analyzer
I have two Servers in a DAG and I want to remove one Server from DAG. For now, every time I shutdown one of the servers I get an HTTP ERROR 500 after I enter my credentials to login into OWA or ECP. Am I going to kill both servers if I remove the one I want from the DAG?
i have two exchange servers 1 and 2 with one witness server exchange database was active in exchange 1 but right now i lost my exchange 1 and witness server and exchange 2 is not working it says that database is active in exchange 1 so how to switch over my database to exchange 2 while my witness and exchange 2 is down .
You are the shit Paul
Great Article here. I have this situation:
1) Both DAG members were lost due to a storage loss, bad hard drives caused both VMs to to basically become corrupted and unusable. No backups.
2) DAG Members: EXCH2016MR1 and EXCH2016MR2 – only servers in the environment.
The domain controllers were all on a different set of storage and they are fine. Were trying to do a recovery but are stuck at this point:
1) Server > Exch2016MR1 was hosting the active copy of this database E2016-DB1.
2) I am not able to use the command move-activemailboxdatabasecopy as the server exch2016mr2 was also lost.
3) Since I cant move the active copy of the database from EXCH2016MR1 to any other server (since all servers were lost) I cannot remove the database availability group server from the DAG and so I cannot recover the server.
4) No production data in the databases on this server. So no worries about data loss. as all mailboxes are in the cloud.
The errors I keep getting are “The Replication Services is unavailable on server exch2016mr1” which is to be expected as the server was lost.
I got it figured out. I had to remove all mailbox database copies, and then use Remove-mailboxDatabase to delete the last copy. I was then able to complete my DAG server recovery.
Thanks for the excellent information. We have a real problem. Created a DAG with 2 Servers (Exchange 2019 Build 464.5), with DB copied. Have no idea what happened but the DAG/Quorum was missing and the whole thing crashed. Needed to do something so basically with force uninstalled the Failover Cluster and removed the DB copies. At least one server functions so Email is working. But now would need to recreate the DAG. When we try to start the cluster service get errors 1090, 7024, and 7031. Any suggestions how we can continue?
Appreciate any help,
Well didn’t get an answer, but got the DAG up and running again. 3 DB’s are copying, however when one DB copy crashes the one DAG member Server ceases to be an operational server, cannot mount DB. get error 3154.
So how to bring the Member server back into operational status, cluster service is running, in fact all services are running.
happy new year first;-) I have two DAG member servers and one Witness server. I backup my DAG exchange Systems with Veeam.
My question is, in case of failure of one exchange, is it correct just to recover the broken Exchange from a healthy Veeam backup and have overwrite the broken exchange in the VMWare Infrastructure?
When I try to do the recovery install it tells me that I must use the exact version. Not sure where I can get the full install of CU11 for Exchange 2013. I downloaded SP1 as that was what I could find online and it gave me this is error:
Prerequisite Analysis FAILED
Exchange Server version Version 15.0 (Build 1156.6) or later must be used to perform a recovery of this server.
For more information, visit: http://technet.microsoft.com/library(EXCHG.150)/ms.exch.setupreadiness.DrMinVersionCheck.aspx
I downloaded a fresh ISO from VLSC and I get the same error. I tried downloading CU11 and placing in in the same directory, but it’s just an .msp file so I don’t think that’ll help. Any suggestions?
Hey Paul, thanks for your great articles.
I have a more specific question regarding this restore article.
We’ve had a two node exchange 2016 dag (provided by mbx01 & mbx02), witnes on a seperate fileshare.
Unfortunately the VMWare farm crashed due to a SAN problem.
After recovering the storage we’ve recognized that mbx01 was lost.
Mbx02 was at this moment no able to mount the DBs as it stated that mbx01 is the owner.
So we’ve removed mbx01 from db copies and dag and were able to mount the DBs.
Thus mbx02 runs currently as a single exchange server, but still has mbx01 in its configuration. The mentioned tips of removing the failed host from the cluster fails, as actually the cluster service is set to disabled and also fails to start after set it to manual.
Could we however proceed with the steps that follow the cluster removal step?
Or do we need to manually remove the failed mbx01 with adsi edit from ad etc?
I’d love to have it more comfi as your mentioned steps seem to be like.
Thanks a lillion in advance, Patrik
When you recover a DAG node from a bare-metal backup (or a backup of a VM), is the procedure the same as rebuilding a DAG node from scratch (as you seem to be describing here)?
Depends what you mean by “bare-metal” but it sounds like you’re referring to an unsupported recovery method for Exchange servers.
By bare-metal I mean an Exchange server installed directly on a physical server (non-virtualized). Or the equivalent would be to restore a virtualized DAG member using something like Veeam.
But you’re saying that you shouldn’t restore a DAG member that way. And that the proper way is to rebuild the DAG member from scratch using the method you’re specifying above. Is this correct?
The question is a little vague because I still don’t entirely understand what restore method you’re referring to, but I’ll say this. It’s not supported to recovery an entire Exchange server from a backup that represents a previous point in time. For example, it’s not supported to restore an Exchange server using a VM snapshot or VM backup.
The supported restore method is to to a recovery install, which is part of the process outlined above. For DAG members there’s extra prep work involved because you need to remove the dead server from the DAG config first.
I was referring to recovering a VM from a backup, which is clearly not supported. I would need to do a recovery install instead.
Thanks for the advice,
Pingback: How to recover from FWS and DAG Member failure in 2 Node DAG | Hope you like it..
I have followed the steps mentioned above. All commands ran successfully except for the last one “Get-ClusterNode mail2 | Remove-ClusterNode”. It gives this error “Get-ClusterNode : Failed to retrieve the node ‘mail2’ from the cluster ‘Exch16-DAG’ ”
Any ideas why i am getting this error ?
Looks to me like mail2 is not currently a member of the cluster.
We have an Exchange 2016 DAG with 2 Exchange Servers. The DAG is setup old-fashioned ( DAG IP, Failover Cluster Manager externally configured, File share witness configured in Failover Manager).
Hope you get the installation.
So, as you have stated clearly before, we need to change this implementation because we have frequent disconnections and failover invocation causing exchange disconnections.
Is there any way to change to the best-practice installation now?
I don’t understand your question. If you know what the best practices are, it should be fairly simple to plan a series of changes to get from your current configuration to one that aligns with those best practices. Is there something specific you’re not sure about?
My apologies, the question was too generic.
To my knowledge, Exchange 2016 DAG configures VIP and cluster database role automatically (No Failover Cluster Manager in Windows setup needed). However, in our case there has been a manual DAG IP set which is assigned to either or the other node. Failover cluster manager is manually setup through Windows Cluster Manager Role and all Clients are searching for this particular DAG IP via DNS.
So, we are planning to switch to automatic configuration, which means that Exchange needs to be installed from scratch, right?
If so, can we backup the database and afterwards restore it to the new DAG configured?
Is there a way to do it without downtime?
Thank you a lot.
* To make it clearer, our DAG IP is used for CAS Role as well. Hope it helps.
The DAG IP is not a client access endpoint. Your CAS namespaces should not be resolving to the DAG IP. That is something you should fix.
If you want to go with an IP-less DAG instead then you’ll need to remove the DAG first, then recreate it. You don’t need to reinstall Exchange. There’s no downtime involved in that, but it will mean a period of time during which there’s no HA for your databases.
Great article as always. When installing Exchange with the recovery switch, do we need an ISO that already has the CU we need incorporated, or can we install the base Exchange package and then the required CU as a second step?
You must recover a server using the same build that it was running when it failed.
There is no “base Exchange package” btw.
Thanks a lot! I faced with the same issue and it really did help! Very very much appreciated!
I have done the recovery and all has come up but I try to log in with my admin account and it says it can’t access the mailbox. We did not recover the DB because it is just a test lab. Should I go into ADSIEdit and remove the DB?
If you’ve recovered a DAG member you can just reseed the database copies back to the DAG member. Not sure what you mean when you say you didn’t recover the DB.
Truly amazing article. Thanks for the detailed steps! Hope we dont ever need This though.