Home » Exchange Server » Exchange Server 2013 Lagged Database Copies In Action

Exchange Server 2013 Lagged Database Copies In Action

This article is an excerpt from the Deploying and Managing Exchange Server 2013 High Availability ebook.

A lagged database copy is a passive database copy in a database availability group that has a delayed log replay time configured.

Normally a passive database copy will replay the transaction log data into the database immediately, so that the passive database copy is as up to date as possible.

With a lagged database copy the administrator sets a delay on the log replay, so that the database copy “lags” behind the others in terms of the latest database changes. This lag interval specifies the amount of time between when a transaction log file is generated and when it is replayed into the passive database copy. The default lag interval is 0 and the maximum lag interval is 14 days.

The purpose of a lagged copy is to provide the ability to recover the database from an earlier point in time if some kind of database fault occurs, such as logical corruption.

Configuring Lagged Database Copies

The replay lag interval for a lagged database copy is configured using Set-MailboxDatabaseCopy, and the value is in the format “days.hours:minutes:seconds”. For example, a value of “7.0:0:0” means 7 days.

It is common to choose the least preferred database copy to be the lagged copy, so in an environment with four database copies the copy with activation preference of 4 would be set as the lagged copy.

To select all database copies that have an activation preference of 4 we can use Get-MailboxDatabaseCopyStatus.

Piping that output to Set-MailboxDatabaseCopy will configure the replay lag interval.

Real World: When you set the replay lag time you will see a warning about the SafetyNetHoldTime as well. It is always recommended to set Safety Net hold time to the same value or greater value than the replay lag time.

In addition to setting the replay lag time you can also prevent a lagged database copy from being automatically activated during a database failover scenario. To block automatic activation for a database copy use the Suspend-MailboxDatabaseCopy cmdlet.

When automatic activation has been blocked in this manner the database copy can only become active after manual intervention by an administrator.

Real World: Blocking automatic activation for a database copy reduces the number of database copies that are available for Active Manager to attempt to mount during the Best Copy and Server Selection process. It is a decision for your organization whether the preservation of the lagged database copy by blocking automatic activation is more important than having the maximum number of available database copies available to mount in a failover scenario.

Using Lagged Database Copies in Recovery Scenarios

Lagged copies can be used for recovery in a variety of ways.

  • Activate a lagged database copy by replaying all uncommitted log files. In this scenario a decision is made to replay all of the log files in the replay queue into the database and bring it online.
  • Activate a lagged database copy to a specific point in time. In this scenario some of the log files are removed from the server to prevent them from being replayed into the database copy as it is brought online.
  • Activate a lagged database copy and use Safety Net for recovering lost data. In this scenario the database is mounted without replaying transaction log from the replay queue, and messages are resubmitted to the database from Safety Net.

In each of these scenarios there is an optional (but recommended) step to make a copy of the lagged database files before activating it, so that you are still able to use the lagged copy at a later stage if you have a different recovery scenario arise.

Activating a Lagged Database Copy by Replaying All Log Files

A lagged database copy can be activated by replaying all of the uncommitted transaction log files (those that are in the replay queue) into the database and then mounting it. This effectively “catches up” the database copy to the others.

The amount of time it takes to commit all of the outstanding transaction log files to the database will depend on the length of the lag interval and the amount of transaction log data that has been generated in that space of time.

First, identify the lagged copy of the database. In this example of database DB01 the lagged copy is DB01MELEX2.

Suspend the lagged database copy using Suspend-MailboxDatabaseCopy.

Make a backup of the database and log files for this database copy. You can use a backup application or simply copy them to another location.

A suspended database copy can’t be made active, so we need to resume the lagged database copy again. It may take a few minutes afterwards for the content index to return to a healthy state as well.

Finally, activate the lagged database copy. Note that this uses the same Move-ActiveMailboxDatabase cmdlet as a normal database switchover, with the additional –SkipLagChecks switch.

When all of the log files have been replayed the database copy will mount and become the active database copy.

The database copy still has the same replay lag interval configure, it is just not in effect as long as the database copy is active. If the active database copy is moved to another DAG member then the replay lag interval will come into effect again and the replay queue length will begin increasing for the lagged database copy.

Activating a Lagged Database Copy to a Point in Time

Activating a lagged database copy to a specific point in time follows a different process, and is often used as a way to make a copy of the database to mount as a recovery database, rather than activate the lagged copy itself.

First, identify the lagged copy of the database. In this example of database DB02 the lagged copy is DB02SYDEX1, so this procedure should be performed on server SYDEX1.

Suspend the lagged database copy using Suspend-MailboxDatabaseCopy.

Make a backup of the database and log files for this database copy. You can use a backup application or simply copy them to another location.

Look at the log file timestamps to determine which log files were created after the point in time you want to recover to.

transaction-logs-dates

Move the log files after the time that you’re recovering to into another folder. Don’t delete them, just in case you need them, but they do need to be moved to another folder.

Delete the .chk file from the transaction log folder.

transaction-logs-chk

Using ESEUtil in a CMD prompt we can see that the database file in a dirty shutdown state.

So the next step is to perform a soft recovery using the log files that we’ve left in the log folder. Navigate to the log folder for the database and run ESEUtil with the /r switch and the log prefix (E00 in this example).

Check the database file again to verify it is in a clean shutdown state.

The database file can now be copied to another location and used as a recovery database for a restore operation.

The lagged database copy itself can be resumed as well. If you wanted to return it to the lagged state before the recovery process above was run you would need to restore the files you backed up at the start of this process before resuming the database copy.

Activating a Lagged Database Copy Using Safety Net Recovery

Activating a lagged database copy using Safety Net recovery involves removing all unnecessary transaction log files from the log folder, so that only the bare minimum required for mounting the database still exist.

First, identify the lagged copy of the database. In this example of database DB03 the lagged copy is DB02SYDEX2, so this procedure should be performed on the server SYDEX2.

Suspend the lagged database copy using Suspend-MailboxDatabaseCopy.

Make a backup of the database and log files for this database copy. You can use a backup application or simply copy them to another location.

Use ESEUtil to determine which log files are required by the database.

There are two hexadecimal values displayed in the ESEUtil output. In this example they are “0x165a” and “0x1660”. It is the second value that is of interest, because this is the “High Generation” number.

Move all log files that have a sequence number that is higher than the “High Generation” number. For example, 0x1660 means log files E0100001661 and above should be removed from the server hosting the lagged copy, which is SYDEX2. Don’t delete the files, just move them to a temporary folder in case you need them again.

transaction-logs-safety-net-01

Note: Transaction log file sequence numbers use a hexadecimal numbering sequence. This means that, for example, log file E0100001699 is not followed by E0100001700, but rather E01000016A0 instead. The last log file in the E01000016xx range is E01000016FF, and then E0100001700. This is important to be aware of because sorting log files by name in Windows Explorer does not correctly sort them by log file sequence number. If you are not aware of this, and do not remove all of the correct log files, then the switchover to the lagged database copy will fail.

Next, identify the server hosting the active copy of the database. In this example it is the server SYDEX1.

On the server hosting the active copy of the database stop the Microsoft Exchange Replication service.

msexchangereplicationservice

Perform a database switchover to activate the lagged database copy, in this example the server is SYDEX2.

The database will mount and make a request to Safety Net for resubmission of the mail that is missing from the database.

After Lagged Copy Activation

Activating a lagged database copy is not the end of the exercise. There are still several things that you should do to ensure that your environment is back to a healthy state. After all, some event occurred to cause you to activate a lagged copy, so you most likely still have work ahead of you to restore your environment to its original healthy condition.

Here are some steps that you should consider:

  • Use Get-MailboxDatabaseCopyStatus to review the health of your database copies
  • Re-seed any database copies that are failed or otherwise considered to be unusable (eg, if the reason you activated a lagged copy was due to corruption in your database)
  • Move the active database copy back to a more preferred server
  • Re-apply or verify that your lagged copy configuration is in effect

This article is an excerpt from the Deploying and Managing Exchange Server 2013 High Availability ebook.

Paul is a Microsoft MVP for Office Servers and Services. He works as a consultant, writer, and trainer specializing in Office 365 and Exchange Server. Paul is a co-author of Office 365 for IT Pros and several other books, and is also a Pluralsight author.
Category: Exchange Server

20 comments

  1. Chris says:

    I do not see any advantage in a lagged copy of a database.
    If any kind of corruption happens on a database we still have daily backup of the active DB which we can “activate” very shortly.

    With lagged copy of a database, you still need a passive copy for high availability, you will end up with two copies of each database + the backup the active one . For me it is a waste of storage for what it brings.

  2. Richard says:

    Lagged DB copies are THE solution if a design has to contain asynchronous technologies? Or are there also other alternatives?

  3. Hardik Desai says:

    Lagged copies is a feature available with DAG just in case you want to go backupless. You can go back in time to recover the data upto a point.

  4. Ireno Thomas says:

    The only reason to have Lagged copy when you have Back-up less.
    And the lagged copy should be after one day or….Business requirements.
    Best practice with Back-up less is to have minmaal four Copy and Circular Logging (CRCL) on all mailboxes.

  5. Daniel Thompson says:

    All depends on your enterprise. Lag copies make no sense to me if you’re getting daily fulls and can restore single items.

    If you’re backup less having single lag copy would be enough for me, I wouldn’t have more than one copy of it though unless store space was never an issue.

  6. Jeff Cothern says:

    As someone else stated LAG copies can be of use for a backupless environment.

    Also suppose you need to recover 40 emails for one user that got deleted a week ago. It is conceivable that using the LAG copy would be quicker depending on the backup technology used.

  7. Jack says:

    Hi Paul,
    I don’t know how to set ReplayLagStatus:Enabled of a mailbox database copy from false to true ? (set “Replay lag time” to a number lagger than 0 ?)
    One of my exchange server has problem and shutdown unexpectedly , After it back to online , a passive database copy cannot replicate from the active one.
    ——————————————————
    ERROR
    Mailbox Database 01
    An Active Manager operation failed. Error: The database action failed. Error: An error occurred while trying to validate the specified database
    copy for possible activation. Error: Database copy ‘Mailbox Database 01’ on server ‘idcexc001’ has 153 logs to inspect and replay,
    which is higher than maximum allowed o of 10. If you need to activate this database copy, you can use the Move-ActiveMailboxDatabase
    cmdlet with the -SkipLagChecks and -MountDialOverride parameters to forcibly activate the database copy. If the database does not automatically mount after
    running Move-ActiveMailboxDatabase successfully, use the Mount-Database cmdlet to mount the database. [Database: Mailbox Database 01, Server: EX01
    —————————–
    PS H:> Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01EX01 Mounted 0 0 Healthy
    Mailbox Database 01EX02 Suspended 3988 111 19/03/2015 3:22:06 PM Suspended

    PS H:> Get-MailboxDatabaseCopyStatus “Mailbox Database 01” | fl “*replay*”

    ReplayQueueLength : 0
    ReplaySuspended : False
    LastReplayedLogTime :
    LastLogReplayed : 0
    LogsReplayedSinceInstanceStart : 0
    LogReplayQueueIncreasing : False
    ReplayLagStatus : Enabled:False; PlayDownReason:None; Percentage:0; Configured:00:00:00; Actual:00:00:00

    ReplayQueueLength : 111
    ReplaySuspended : False
    LastReplayedLogTime : 19/03/2015 3:07:31 PM
    LastLogReplayed : 4656967
    LogsReplayedSinceInstanceStart : 0
    LogReplayQueueIncreasing : False
    ReplayLagStatus : Enabled:False; PlayDownReason:None; Percentage:0; Configured:00:00:00; Actual:03:13:45

    I want to follow your article to force replication between 2 servers.

      • Jack says:

        Yes, sorry about that, I’m freaking out, after EX02 (shutdown unexpectedly) back to online “Mailbox Database 01” cannot be mounted, it is mounted on EX01 then failed, then mount on EX02 then failed and it keep switching until I suspended it’s copy on EX02.
        After thinking, in my case I don’t need “using/activating a lagged database”, I need to repair “Mailbox Database 01” passive copy on EX02. I checked with command :
        eseutil /mh “path to Mailbox Database 01 edb file on EX02” and result :
        State: Dirty Shutdown
        Log Required: 4656967-4657077 (0x470f47-0x470fb5)
        Log Committed: 0-4657078 (0x0-0x470fb6)
        So I should repair it with eseutil /r to get it into clean shutdown state , right ? But I didn’t.
        First, I tried my luck: I log in the ECP / Servers / Databases and begin updating mailbox database 01 copy from EX01 to EX02 again. But my mailbox database 01 is about 600GB so I’m waiting for about 590GB data to be updated (I’m wondering this is a bad idea ) and hopefully everything will be fine.

        • Yes, configuring lag settings is now how you would recover from a failed database copy scenario.

          If the copy is mounted and healthy on EX01, and you’re happy that EX01’s copy has all the data you need (which if the failover was automatic is likely the case), then you can reseed the copy on EX02 if it looks like it is unhealthy.

  8. Jack says:

    Hi Paul, after finished seeding, everything seems fine, I’m keeping my eyes on them to see if any problem

    [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01IDCEXC001 Mounted 0 0 Healthy
    Mailbox Database 01IDCEXC002 Healthy 0 1188 3/20/2015 9:49:36 AM Healthy

    [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01EX01 Mounted 0 0 Healthy
    Mailbox Database 01EX02 Disconnected… 0 1188 3/20/2015 9:49:36 AM Healthy

  9. Jack says:

    Hi Paul, after finished seeding, everything seems fine, I’m keeping my eyes on them to see if any problem. Can you give me advice about this output ?

    [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01EX01 Mounted 0 0 Healthy
    Mailbox Database 01EXC02 Healthy 0 1188 3/20/2015 9:49:36 AM Healthy

    [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01EX01 Mounted 0 0 Healthy
    Mailbox Database 01EX02 Disconnected… 0 1188 3/20/2015 9:49:36 AM Healthy

    [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
    Length Length State
    —- —— ——— ———– ——————– ————
    Mailbox Database 01EX01 Mounted 0 0 Healthy
    Mailbox Database 01EX02 Healthy 0 1136 3/20/2015 9:52:16 AM Healthy

    • Jack says:

      Last update : Everything is fine now
      [PS] C:Windowssystem32>Get-MailboxDatabaseCopyStatus “Mailbox Database 01”

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
      Length Length State
      —- —— ——— ———– ——————– ————
      Mailbox Database 01EX01 Mounted 0 0 Healthy
      Mailbox Database 01EX02 Healthy 0 0 3/20/2015 10:43:36 AM Healthy

  10. JHK says:

    I’m facing a scenario where I am unable to activate lagged database copies at all. We use three servers in every DAG with the tertiary copy being lagged by 7 days. Testing the failure scenario of 2 servers being down simultaneously I ran into the following: ” Database copy ‘XXXXXXXXXXXXX’ on server ‘XXXXXXXXXXX’ has a total (copy plus replay) queue length of
    843 logs, which is higher than the maximum allowed queue length of 400. If you need to activate this database copy,
    you can use the Move-ActiveMailboxDatabase cmdlet with the -SkipLagChecks and -MountDialOverride parameters to
    forcibly activate the database copy. If the database does not automatically mount after running
    Move-ActiveMailboxDatabase successfully, use the Mount-Database cmdlet to mount the database.

    Any thoughts on how I can increase the maximum allowed queue length?

  11. Brady says:

    Hey Paul,

    We are implementing a 4 server approach to our new exchange 2013 environment. 3 mailbox servers which will have active and passive copies of each database, located in 2 datacenters. We are also putting in a Lagged server for corruption protection. What about backing up the Lagged copy?

    Obviously this is an extreme scenario where corruption occurs on the 3 mailbox servers and the lagged copy goes down, but do you see any benefit of backup up the lagged copy? Could we use another lagged server?

    Thank you

  12. filip says:

    If auto failover is not blocked for the lagged copy and it fails over to lagged copy, will logs be replayed/inserted to db before activating?

Leave a Reply

Your email address will not be published. Required fields are marked *