Exchange Server 2007 has some very useful replication features such as Local Continuous Replication (in RTM and SP1) and Standby Continuous Replication (in SP1 only). These features can provide nice and simple disaster recovery options by replicating your storage group logs and databases to another location, be it another disk/LUN with LCR or another mailbox server with SCR.
However, if you choose to implement LCR or SCR you should be aware of the implications for your backups. A normal Exchange backup will truncate the transaction logs for your storage group, however when LCR or SCR are deployed this truncation may not work as you first expect.
One example of this is when using SCR to replicate your storage group to another mailbox server. The source mailbox server ships each transaction log to the SCR target where it is replayed into a replica of the database. In this scenario there are two queues created – the copy queue and the replay queue.
[PS] C:\>Get-StorageGroupCopyStatus -StandbyMachine SERVER2 Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ First Storage Group Healthy 2 3232 25/04/200...
The copy queue length is the number of transaction log files yet to be shipped to the standby server. The replay queue length is the number of transaction log files yet to be replayed into the replica database (in this case the default 24 hour replay delay has been used).
Now that the source server knows it is shipping logs to a standby server it will be sure not to remove any transaction logs that are yet to be shipped. This would mean that if replication is suspended or fails for any particular reason, and the copy queue length begins to grow quite long, that the source server will not truncate the logs during a normal backup.
However, even when SCR appears healthy and has a zero length copy queue there may be problems with the RPC calls between the servers to notify of log truncation events. If the source and target cannot verify with each other that the logs have shipped and replayed then the source server will not truncate the logs during a normal backup.
To monitor for these types of scenarios it is wise to check for MSExchangeRepl warnings and errors on the target server. For example:
Event Type: Warning
Event Source: MSExchangeRepl
Event Category: Service
Event ID: 2137
Date: 19/02/2008
Time: 6:31:25 AM
User: N/A
Computer: SERVER2
Description:
Log truncation request to the Information Store using RPC has failed for storage group ‘SERVER1First Storage Group’. Error code: 4294966264.For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
When such SCR issues are interfering with backup operations one of the simplest ways to resolve the issue is to reseed the SCR target.
[PS] C:\>Suspend-StorageGroupCopy "SERVER1First Storage Group" -StandbyMachine SERVER2 [PS] C:\>Update-StorageGroupCopy "SERVER1First Storage Group" -StandbyMachine SERVER2 -DeleteExistingFiles [PS] C:\>Resume-StorageGroupCopy "SERVER1First Storage Group" -StandbyMachine SERVER2
In the near future I’d like to nail down some more specific scenarios, causes and remedies, but for now I’ll just summarise by saying if you are having backup problems in an environment where LCR or SCR are deployed, look to your event logs for MSExchangeRepl errors and consider reseeding your replicas to get your backups behaving normally again.
Does this hold true for Exchange 2010? I’m having the same error on my replica server. The copies show healthy but im getting 2137 event id every few minutes for each database I’m replicating. We haven’t done a backup of the active databases yet. We only have our backup software on the main server. nothing on the replica server that is holding the passive copies.
Pingback: Project Coconut: Part 4 - Configuring Standby Continuous Replication - The Capslock Assassin