There are two things I tend to see a lot of at the moment. Firstly virtualisation is pretty hot right now. Everyone seems to be virtualising their infrastructure as much as possible, particularly their servers such as Domain Controllers. Secondly, some companies are too cheap (or just haven’t gotten around to it yet) to setup a proper backup and recovery solution for their servers. This often means they are relying on some ad-hoc Ghost or Acronis images for server recovery. This includes recovery of their Domain Controllers.

I specifically mention Domain Controllers twice there because both of these very common scenarios introduce the serious risk of a “USN rollback” condition occurring (USN stands for “update sequence number”). If you want to get deeply technical with the concept you can read this article from Microsoft:

How to detect and recover from a USN rollback in Windows Server 2003

If you just want the summary version, basically a USN rollback condition can occur when the Active Directory database is restored to an earlier version in an improper fashion. Microsoft makes available methods for restoring Active Directory databases such that the Domain Controller can properly resynchronise with its replication partners afterwards. Restoring in an improper fashion, such as restoring a DC using an earlier Ghost or Acronis image, or rolling back to an earlier snapshot of a virtualised DC, will cause a USN rollback condition to occur.

A Simple USN Rollback Scenario

You can create a USN rollback condition by deliberately performing one of the restoration methods mentioned above. Here I have created two virtualised Domain Controllers named TESTDC1 and TESTDC2, both in the testing.local domain. Looking at the servers I can see that they appear to be in a healthy state of replication. Active Directory Users and Computers shows the same user objects that I created have replicated between the servers.

usnrollback001.jpg

Replmon.exe indicates successful replication is occurring.

usnrollback001b.jpg

Running DCDiag.exe /q (the /q switch suppresses all output except for errors, so if there is no output there is no errors) indicates all is well.

Next I shut down TESTDC2 and make a copy of the virtual hard disk. I then boot TESTDC2 again, and confirm once more that replication is healthy. I can then make a few changes to Active Directory to demonstrate the problems with USN rollback. Using the Active Directory Users and Computers console I create the user object User3 while connected to TESTDC1, and the user object User4 while connected to TESTDC2. As you can see here the user objects appear in the Active Directory of the Domain Controllers, but are yet to replicate between the two servers.

usnrollback002.jpg

In a real world environment some event might occur such as a hardware failure on TESTDC2, or simply a human decision to roll the server back to the last image or snapshot. I shut down TESTDC2, remove the current virtual hard disk, and copy back the virtual hard disk file from before. As soon as I boot TESTDC2 again everything starts to go bad.

Administrators might first become aware of the problem when they notice that changes they make in the course of their day are not replicating to all the domain controllers. For example, I notice that User3 is appearing on TESTDC1 but not on TESTDC2, even after several hours of waiting. If I attempt to force replication between the two servers in Active Directory Sites and Services I receive an error. A similar error is also now appearing in Replmon, and the Directory Services Event log is showing some critical errors.

usnrollback003.jpg usnrollback005.jpg usnrollback004.jpg

The event ID to look out for in this scenario is 2095. The full details of this event are as follows.

Event Type: Error
Event Source: NTDS Replication
Event Category: Replication
Event ID: 2095
Date: 1/06/2007
Time: 4:40:20 PM
User: NT AUTHORITYANONYMOUS LOGON
Computer: TESTDC2
Description:
During an Active Directory replication request, the local domain controller (DC) identified a remote DC which has received replication data from the local DC using already-acknowledged USN tracking numbers.

Because the remote DC believes it is has a more up-to-date Active Directory database than the local DC, the remote DC will not apply future changes to its copy of the Active Directory database or replicate them to its direct and transitive replication partners that originate from this local DC.

If not resolved immediately, this scenario will result in inconsistencies in the Active Directory databases of this source DC and one or more direct and transitive replication partners. Specifically the consistency of users, computers and trust relationships, their passwords, security groups, security group memberships and other Active Directory configuration data may vary, affecting the ability to log on, find objects of interest and perform other critical operations.

To determine if this misconfiguration exists, query this event ID using http://support.microsoft.com or contact your Microsoft product support.

The most probable cause of this situation is the improper restore of Active Directory on the local domain controller.

User Actions:
If this situation occurred because of an improper or unintended restore, forcibly demote the DC.

Remote DC:
d63ef566-f3a9-4700-ae27-a5c5ac7c9fe0
Partition:
DC=testing,DC=local
USN reported by Remote DC:
16435
USN reported by Local DC:
16387

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

There are also instances of event ID 2013.

Event Type: Error
Event Source: NTDS General
Event Category: Service Control
Event ID: 2103
Date: 1/06/2007
Time: 4:40:20 PM
User: NT AUTHORITYANONYMOUS LOGON
Computer: TESTDC2
Description:
The Active Directory database has been restored using an unsupported restoration procedure.

Active Directory will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.

User Action
See previous event logs for details.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

You definitely want to know about these errors when they occur. If you are running any kind of monitoring system that scrapes event logs and alerts you for certain events then these are two you want to be alerted for. If you are not paying attention this problem can surface and go unnoticed for quite some time. Your admins might just be scratching their heads a little as to why some odd behaviour is occurring in Active Directory. Meanwhile your server event logs are overwriting older events and may remove this crucial evidence, as happened to a customer of ours.

If you do not have the benefit of seeing those events in the Directory Services Event Log there are some other clues you can look out for. Firstly the repadmin.exe command can help identify the state of replication on the Domain Controller.

C:\>repadmin /options
repadmin running command /options against server localhost
Current DC Options: (none)

If the output is as above, then replication is not explicitly disabled on the Domain Controller. Note that a Global Catalog server will show an “IS_GC” option as being active instead of “(none)”. However if the output is as follows then replication has been disabled on the Domain Controller.

C:\>repadmin /options
repadmin running command /options against server localhost
Current DC Options: DISABLE_INBOUND_REPL DISABLE_OUTBOUND_REPL

More evidence is if the NetLogon service is in a “paused” state on the Domain Controller.

C:\>sc query netlogon
SERVICE_NAME: netlogon
TYPE : 20 WIN32_SHARE_PROCESS
STATE : 7 PAUSED
(STOPPABLE, PAUSABLE, IGNORES_SHUTDOWN))
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0

If you attempt to restart NetLogon and re-enable replication with Repadmin while the USN rollback condition is still in effect then the event IDs 2095 and 2013 will appear again in the Directory Services Event Log, which gives you further evidence of the issue. The final clue is by checking the USN that each Domain Controller believes is correct for itself and its replication partners.

On TESTDC1:
C:\>repadmin /showutdvec testdc1 dc=testing,dc=local
Caching GUIDs.
..
Default-First-Site-NameTESTDC2 @ USN 16435 @ Time 2007-06-01 16:37:49
Default-First-Site-NameTESTDC1 @ USN 14272 @ Time 2007-06-01 16:52:08

On TESTDC2:
C:\>repadmin /showutdvec testdc2 dc=testing,dc=local
Caching GUIDs.
..
Default-First-Site-NameTESTDC2 @ USN 16409 @ Time 2007-06-01 16:52:49
Default-First-Site-NameTESTDC1 @ USN 14146 @ Time 2007-06-01 16:04:22

The condition you are looking for is if the direct replication partners have a higher USN for the Domain Controller than the Domain Controller has for itself. In the above output you can see that TESTDC2 has a USN for itself of 16409, whereas TESTDC1 has an USN for TESTDC2 of 16435.

More Complex USN Rollback Scenarios

In the simple scenario above is it relatively easy to conclude that TESTDC2 is the cause of the USN rollback condition occuring. However in an environment with more than two Domain Controllers the evidence can present in different ways. Using the example of our customer, the virtualised Domain Controller that had been rolled back to an earlier snapshot was not showing event ID 2095, its NetLogon service was running, and it did not have inbound and outbound replication disabled. However both of its replication partners were showing those symptoms.

By analysing the USN numbers in the output of the “repadmin /showutdvec” commands on each Domain Controller it was ultimately shown that the virtualised Domain Controller was still the one causing the USN rollback condition.

Because of this type of variance in the real world it is important to investigate the situation carefully and assess all of the available information before making a decision as to how to proceed with resolving the issue.

Recovering from a USN Rollback

The Microsoft article mentioned at the start of this post contains instructions as to how to recover from USN rollbacks.

1. Remove Active Directory from the server causing the USN rollback condition. If you try to run DCPromo.exe on the server you will receive an error.

usnrollback006.jpg usnrollback007.jpg

In order to demote the server you need to run DCPromo.exe with the /forceremoval switch. This is a last resort option for removing a Domain Controller when it cannot be removed by the conventional method.

usnrollback008.jpg usnrollback009.jpg

2. Shut down the demoted server.

3. On a healthy Domain Controller, clean up the metadata of the demoted Domain Controller. This is explained in detail in this Microsoft article.

How to remove data in Active Directory after an unsuccessful domain controller demotion

The enhanced NTDSUtil application in Windows Server 2003 SP1 and above allows you to remove the metadata.
C:\>ntdsutil
ntdsutil: metadata cleanup
metadata cleanup: connections
server connections: connect to server testdc1
Binding to testdc1 ...
Connected to testdc1 using credentials of locally logged on user.
server connections: quit
metadata cleanup: select operation target
select operation target: list domains
Found 1 domain(s)
0 - DC=testing,DC=local
select operation target: select domain 0
No current site
Domain - DC=testing,DC=local
No current server
No current Naming Context
select operation target: list sites
Found 1 site(s)
0 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=testing,DC=local
select operation target: select site 0
Site - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=testing,DC=local
Domain - DC=testing,DC=local
No current server
No current Naming Context
select operation target: list servers in site
Found 2 server(s)
0 - CN=TESTDC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,D
C=testing,DC=local
1 - CN=TESTDC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,D
C=testing,DC=local
select operation target: select server 1
Site - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=testing,DC=local
Domain - DC=testing,DC=local
Server - CN=TESTDC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configurat
ion,DC=testing,DC=local
DSA object - CN=NTDS Settings,CN=TESTDC2,CN=Servers,CN=Default-First-Sit
e-Name,CN=Sites,CN=Configuration,DC=testing,DC=local
DNS host name - TESTDC2.testing.local
Computer object - CN=TESTDC2,OU=Domain Controllers,DC=testing,DC=local
No current Naming Context
select operation target: quit
metadata cleanup: remove selected server
Transferring / Seizing FSMO roles off the selected server.
Removing FRS metadata for the selected server.
Searching for FRS members under "CN=TESTDC2,OU=Domain Controllers,DC=testing,DC=
local".
Deleting subtree under "CN=TESTDC2,OU=Domain Controllers,DC=testing,DC=local".
The attempt to remove the FRS settings on CN=TESTDC2,CN=Servers,CN=Default-First
-Site-Name,CN=Sites,CN=Configuration,DC=testing,DC=local failed because "Element
not found.";
metadata cleanup is continuing.
"CN=TESTDC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=t
esting,DC=local" removed from server "testdc1"
metadata cleanup: quit
ntdsutil: quit
Disconnecting from testdc1...

You must manually remove the records in the Forest and Domain DNS zones for the demoted Domain Controller, and remove if from the list of Name Servers for each of the zones.

usnrollback012.jpg usnrollback013.jpg usnrollback014.jpg

Finally you must remove the demoted server from Active Directory Sites and Services.

usnrollback015.jpg

4. If the demoted server held FSMO roles you can seize them with NTDSUtil.

Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller

5. Turn on the demoted server.

6. Promote the server to a Domain Controller again using DCPromo (if you wish it to have this role again).

7. If the server was a Global Catalog readd this role.

8. Restore FSMO roles to the server (if applicable).

9. Restore the System State (optional). This is only applicable if there was a previous, valid System State backup of the server from before the USN rollback condition occured in which there are Active Directory changes that you require to be restored.

Once the server is fully restored you can check that replication is occuring again with Replmon. You can also observe whether changes to the Active Directory are replicating properly by performing tests such as creating a new user object and waiting for it to replicate between servers.

usnrollback017.jpg usnrollback016.jpg

Finally the Event Log, Repadmin output and the state of the NetLogon service of all of your Domain Controllers can be checked as well.

C:\>repadmin /options
repadmin running command /options against server localhost
Current DC Options: (none)

C:\>sc query netlogon
SERVICE_NAME: netlogon
TYPE : 20 WIN32_SHARE_PROCESS
STATE : 4 RUNNING
(STOPPABLE, PAUSABLE, IGNORES_SHUTDOWN))
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0

Once the recovery is complete you may not be out of the woods yet. As you may have noticed in the screenshots above the User4 object no longer exists in the Active Directory. This is because the object was never able to replicate outbound from TESTDC2 and was therefore lost when the server was forced demoted. You should be aware that aside from objects completely disappearing, some other strange issues may pop up after the recover, such as users reporting that their new password no longer works, but their old password does. In the case of our customer one of the Domain Controllers immediately threw some SAM errors into the Event Log due to a replication conflict for a particular computer account. As a result the computer account was deleted automatically by the Domain Controller, a change which then replicated across the network and required the computer in question to be rejoined to the domain.

As you can see the USN rollback condition is a very serious situation that threatens the integrity of your Active Directory environment. Aherence to proper backup and restore processes for your Active Directory, and caution when dealing with projects that involve virtualising Domain Controllers, can help you avoid this condition in your network. However if you do unfortunately experience this problem in your production environment, careful analysis of the evidence and a clear recovery process as demonstrated here can get your environment back into a healthy condition again.

About the Author

Paul Cunningham

Paul is a former Microsoft MVP for Office Apps and Services. He works as a consultant, writer, and trainer specializing in Office 365 and Exchange Server. Paul no longer writes for Practical365.com.

Comments

  1. Matthew

    Great post – saved my sanity and job – thanks very much!

  2. Nathan

    This post just keeps on giving.. Just done and all-nighter and this has saved my ar5e!

    Thanks!

  3. Naved

    and also I dont have a recent .system state back up

  4. Naved

    If i have 2 domains in a single forest. In parent domain I have 2 DCs and in child domain I have 3 DCs . All are windows server 2008 Standard.
    All of my DCs are in USN Rollback State . AD Replication has gone for a toss.
    How would I overcome this problem?

  5. Voland

    Hi Paul,
    I would appreciated your advise.

    2 domain controllers (2008 R2) were not properly recovered from images as images were taken/recovered in 20 min difference.
    repadmin /showutdvec * dc=domain,dc=com does not show any problem.
    USN shown on each DC for its partner is the same or higher then partners one for itself.
    No errors on both DCs in Directory Service log.

    My worry is “Undetected USN Rollback” which results in undetected divergence where USNs f.e. 2000 through 2100 are not the same between two domain controllers.

    Is any way to determine that?
    Thank you.

  6. Pingback: 1

  7. Mark

    Here we have a situation, the only dc on the forest root domain was cloning and restored with a VMWare image, there is a unique dc on the forest root domain, but it has another sub-domain with another dc …What can we do? we try Rob solution but the dc still having replication issues with the other subdomain ( its appear as a replication partner)and also we tried unsuccesfully to promote another dc in the forest root …So, we are scared!! help!!

    1. Avatar photo
      Paul Cunningham

      You should probably contact Microsoft Support for that scenario.

  8. M. Ganji

    Hi All
    Two Questions :
    1- Is That NTDSUTIL Process Necessary While All DC’s Are Server 2008 R2 Or Just Delete The DC From ADUC Will Automatically Do The Metadata Cleanup ?
    2- Why This Happens ? I Mean While SNAPSHOT Is Just A Full And Exact Copy Of DC VM, Why This USN Issue Should Occur ? In This Way I Guess VDR And Snapshots And Images Are Completely Useless For DC VMs

  9. Tommy

    I experienced a USN rollback yesterday because I used terabyte unlimited image for DOS and BootIt Bare Metal to increase the size of my C: (OS) drive without re-installing. We have two servers running server 2003 with RAID 5. All went well with that until I had to restore an image of the C: drive. That’s when USN rollback occurred. I noticed it from the logs. I followed Paul’s article and it’s back up and running smoothly. I’m glad I didn’t have all the other problems, but I’m printing this article with all the comments because it’s worth a million bucks. Thanks.

  10. Michele

    Hi,
    thanks for this post, I read it very carefully ’cause I think to be in this situation.
    But I have a question on point 3 you say “on a healty dc” cleanup metadata. We have 9 dc in our forest the first of this was virtualized two times so it caused the problem.
    The second dc in the same site present also a lot of warning about kcc but the others dc not.
    Do you think that also the dc2 has now some problem ?

    thanks a lot for yr reply

    1. Avatar photo
      Paul Cunningham

      It depends what “a lot of warning about KCC” means. You’ll need to look into those specific KCC errors you’re getting to determine what they mean.

      This post deals with a pretty specific scenario that can be diagnosed using the information here and in the Microsoft article linked to at the start.

  11. Edwin

    Great thanks!

  12. Mark

    Java is not only slow, it lies to your face as it’s being slow – which turned into a UPN rollback condition for one of my DC’s. Luckily I knew to look for it, and using your instructions, took care of it.

    Thanks!! Awesome instructs and a MUST BOOKMARK for any AD Admin.

  13. Issa

    I’m so happy that I find this post. This is my issue; we used to have an AD running windows 2000. Then I add another one running Windows Server 2008 64-bit (Both server are virtualized). Now, last month I did an update to our secondary domain controller (from Windows server 2000 to 2003) on the VMware Server. Then, I find out there is no enough space and then I revert to Windows Server 2000 Image. I didn’t realize what I did until I get a complaint from our employees that the changes are not taking effect. Then, when I checked the event logs I saw this:
    Event Type: Warning
    Event Source: NTDS General
    Event Category: Replication
    Event ID: 1115
    Date: 6/7/2010
    Time: 6:30:25 AM
    User: Everyone
    Computer: CALCIUM
    Description:
    Outbound replication has been disabled by the user.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Followed with this one:
    Event Type: Warning
    Event Source: NTDS General
    Event Category: Replication
    Event ID: 1113
    Date: 6/7/2010
    Time: 6:30:25 AM
    User: Everyone
    Computer: CALICIUM
    Description:
    Inbound replication has been disabled by the user.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Followed with this one:
    Event Type: Error
    Event Source: NTDS General
    Event Category: Service Control
    Event ID: 2103
    Date: 6/7/2010
    Time: 6:30:25 AM
    User: Everyone
    Computer: CALICIUM
    Description:
    The description for Event ID ( 2103 ) in Source ( NTDS General ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: .

    Finally with this one:
    Event Type: Error
    Event Source: NTDS Replication
    Event Category: Replication
    Event ID: 2095
    Date: 6/7/2010
    Time: 6:30:25 AM
    User: Everyone
    Computer: CALICIUM
    Description:
    The description for Event ID ( 2095 ) in Source ( NTDS Replication ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: 490ffb18-af6c-4e46-8c60-9401bd86d822, DC=companydomain,DC=com, 3876031, 3871087, Ignore USN Rollback, Dsa Not Writable.

    So, after I did a quick research I find this article from MS:
    http://support.microsoft.com/kb/885875/
    And I followed this article up to step 3:
    Method 1:
    *) Start the Net Logon service.
    *) Enable inbound and outbound replication by using the following command:
    repadmin /options DC_Name -disable_inbound_repl -disable_outbound_repl
    *) If the incorrectly restored domain controller hosts operations master roles, transfer these roles to a healthy domain controller. For more information, click the following article number to view the article in the Microsoft Knowledge Base:
    255504 (http://support.microsoft.com/kb/255504/ ) Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller.

    Up to this point I didn’t demote any server. Everything looks working fine. The replication is working fine. And just in case, I force the replication on both servers too.

    Then I did an upgrade to Windows Server 2003 (After I fix the space issue). After that I’m always getting this error when I reboot the Server and the only workaround that I have right now to enable Net Logon Manually.
    Event Type: Error
    Event Source: NTDS General
    Event Category: Service Control
    Event ID: 2103
    Date: 8/2/2010
    Time: 9:41:38 AM
    User: NT AUTHORITYANONYMOUS LOGON
    Computer: CALICIUM
    Description:
    The Active Directory database has been restored using an unsupported restoration procedure.

    Active Directory will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.

    User Action
    See previous event logs for details.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    I don’t think I have a USN problem, please see the result on the healthy DC POTASSIUM (has windows 2008):

    C:\>repadmin /showutdvec POTASSIUM dc=companydomain,dc=com
    Caching GUIDs.
    ..
    Default-First-Site-NamePOTASSIUM @ USN 846381 @ Time 2010-08-04 14:14:46
    Default-First-Site-NameCALICIUM @ USN 4302335 @ Time 2010-08-04 14:09:15

    And this is the result on CALICIUM that has Windows Server 2003 (Which I restore the image):

    C:\>repadmin /showvector DC=companydomain,dc=com CALICIUM
    Default-First-Site-NamePOTASSIUM @ USN 846381
    Default-First-Site-NameCALICIUM @ USN 4302348
    Also, this is the result after running repadmin on POTASSIUM:
    C:\>repadmin /options
    Repadmin: running command /options against full DC localhost
    Current DSA Options: IS_GC

    The result on CALCIUM
    C:\>repadmin /options
    Current options: IS_GC

    Now, in my case is it OK to delete the registry field HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNTDSParameters β€œDsa Not Writable”=dword:00000004β€³ ?? Because I’m not comfortable to demote any server. I don’t have strong experience in AD. Please help.

    1. Avatar photo
      Paul Cunningham

      You could try it, but I would take a full backup of both DC’s first. If the problems persist you may need to go ahead and demote the server that created the USN rollback condition.

      1. Issa

        Thanks Paul for the reply.

        But my question is:
        According to the previous result, do I have USN rollback condition??

      2. Avatar photo
        Paul Cunningham

        Yes, it looks like you do (or did). The fact that your DC’s Netlogon service pauses every time you restart is a strong indicator that the impact of the USN rollback is still being seen in your environment.

      3. Issa

        Can I use the backup utility in windows to backup the servers?? or do you prefer a different way??

      4. Avatar photo
        Paul Cunningham

        Any Active Directory-aware backup app will do the trick, including the built-in backup utility.

        As you’ve obviously learned recently, don’t just take VM snapshots of the DC’s and expect to be able to roll back to them.

      5. Issa

        Unfortunately, I learned this in the hard way. I want to do a full backup for the servers this weekend and I want to delete the registry entry afterward. I will let you know the result.

        Thanks a lot Paul for your help. YOU THE MAN.

  14. Tum

    Thanks for quick reply.
    I already delete Dsa Not Writable registry key without demote any server.
    run dcdiag it pass all test,
    repadmin /showrepl all is successful and no error

    but in my case, somehow
    repadmin /showutdvec * dc=bangkok,dc=company
    first dc got higher usn number than replicate partner.
    maybe because I left it too long, and alot of AD change was made to first DC
    so it make USN number on first DC grow higher than replicate partner.

    in this kb http://support.microsoft.com/kb/875495
    on number 7 of topic : The effects of a USN rollback
    is very well explain my case, “up-to-dateness vector threshold has been exceeded”.

    previously, I was run repadmin /removelingeringobjects source destination guid /advisory_mode to remove the “lingering object” between both DC.
    i guess that remove some inconsistence of AD database between both DC.

    why after delete that registry ? replicate is back to normal.

      1. Tum

        In long term, I still have to monitor inconsistency in AD between both DC.

        everything else is back to normal now.
        this registry fix is very useful.
        thank you Paul for sharing this KB.

      2. Tum

        Almost 3 year now, and both of my DC + exchange never have any active directory problem at all.
        just want to report long term result back to Paul.

        Thank you Paul.

        1. James

          Thanks so much for providing the long term result. I am faced with a similar situation, my replication and everything else appears to be working fine, but I have different USN numbers reported in repadmin /showutdvec.

          Your reply gives me hope that I can have similar success in the long term.

  15. Avatar photo
    Paul Cunningham

    Tum, you’ve encounted one of the reasons why it is inadvisable to put Exchange on Domain Controllers. In your situation I would probably look at moving all of the mailboxes off the bad DC and remove Exchange before demoting the DC.

    I haven’t tested what would happen in a multi-DC environment by just deleting the registry key. I doubt that will fix it, when the DC’s realise they are still in a USN rollback state the registry flag would probably just be set again.

  16. Tum

    Hi, i have this problem with exchange server+dc
    I have 2 DC in office.
    first DC is windows 2003 sp2 with exchange 2003
    second DC is windows 2008 64bit with exchange 2007

    the first DC have USN rollback problem.
    my case, both server are exchange so I don’t feel comfortable to demote any of them
    1. Can i just delete that registry “Dsa Not Writable” on first DC without demote any server ?

    How this registry key work with USN rollback and replication ?
    2. Does registry key have something to do with “replicate partner have higher usn number than DC itself” ?

    3. since AD database on replicate partner is up-to-date, can DC re-populate its local copy by copy everything from replicate partner ?

  17. Joe

    Too bad this is happening to me – system recovery et al. What I don’t understand is why AD can’t come up and say “I think I’m out of sync, and Server xxx says it’s more current than I am. I can agree and have it replicate over anything on my end (and log it to a txt file), I can fight back and say I’m more current (thus pop this box on the other side), or I can do nothing.”

    How freaking hard is that!!! In a multi-site enterprise, the idea of having a “GC” that is really a data repository for all things in the extended domain would be great – if all else fails – ask the server at the top!

  18. Steven

    How people can have this happen and windows is still considered ready for production use I will never understand.

    Thanks for the info, hopefully I can fix this then we can migrate to something less brittle.

  19. Hary

    You saved my Life!!! I was searching for many day to solve this Problem. Thanks for that wonderfull Hint!!!!! Sorry for my english πŸ™‚ Thanks so much from Vienna/Austria! My situation was as following: 2 virtual DCΒ΄s (one w2k3 R2 one w2k8), i demotet the 2k3 and used the “Patch” on the w2k8, after reboot everething was working angain.

  20. idgara

    I’m in the same scenario as Erik as we had a hardware failure and the DR restore resulted in Event ID 2095.

    We’re not keen on demoting the Exchange server (restored) either so will try demoting the DC2 and clearing the registry key…I think I’ll try it in a virtual environment first.

    Erik – did it work for you?

  21. Erik

    Just to clarify the steps everyone is taking here…

    DC1 with Exchange = USN 100
    DC2 without exchange = USN 10

    You are demoting DC2 along with cleaning the metabase since you cannot demote DC1 because of exchange and then editing the registry on DC1 with the information supplied by Rob…right?

    This then fixes the netlogon pausing problem when starting up DC1

    Thanks for everyone for contributing!

  22. John Rutkowski

    We restored an SBS2003 server for a client and the System state restored caused the USN symptomswith NETLOGON paused. Tried removing the HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNTDSParameters β€œDsa Not Writable”=dword:00000004β€³

    And it worked! Saved me many hours of frustration! I was getting ready to camp on the MS line. Never gotten anything solved in less than 4 hours with them.

    What we all really need is a supported way from Microsoft to restore a windows server to radically different hardware. Disasters happen and you can’t always restore to like system.

    Moving Novell Netware to new hardware is a piece of cake. And NDS is more robust.

    SBS complicates things becasue EVERYTHING is on one server, DC, PDC, DNS, etc.

  23. Ken

    “Anyway, to stop the server from β€œthinking” it’s messed up, you need to remove the following from the registry:
    HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNTDSParameters β€œDsa Not Writable”=dword:00000004”

    My situation was similar: my sole DC thought it was suffering a USN rollback after an episode of hardware failures trying to promote another server.

    I’d noticed this page back in January and then again today and gave it a try.

    This totally fixed the problem – many thanks for sharing!

  24. Jarrod

    Been struggling with this for a while thanks for the info πŸ™‚

    Deleting the “DSA Not Writable” registry key is as effetive as it is simple, many thanks…

  25. Rob P

    I know how alone a person can feel when faced with a situation not yet documented. I can’t count the number of times I have been helped by a forum like this one.

    It’s good to have helped someone else for a change…

  26. Avatar photo
    Paul

    Thanks again for sharing the solution Rob.

  27. Rob P

    Glad to hear some of you have been successful with this – it was acostly and time-consuming issue for us, but that server has been running fine now for almost a month.

    Hopefully, others can benefit from the sweat we have shed on this one.

  28. Denis

    Works like a charm. I was already getting despirate grinding through AD with ldp.exe trying to figure why Windows still complains when there is only one DC left in the domain… Glad I finally made it here. At the moment there is no mentioning of this reg key in any other USN rollback related articles and I’ve been looking at least for half of the day.

  29. Paul

    Well it looks like that fix works just fine. I’ve written a post about it and it would be great to hear from anyone else who has success with this solution.

    Please, take a full backup before trying it πŸ™‚

  30. Paul

    Thanks for sharing that Rob, I will try that out on my lab server.

  31. Rob P

    I spent about 30 hours on the phone with Microsoft. The problem has been resolved. However, I can’t provide step-by-step information on how to resolve your problems, but I can provide some insight. Be aware that some of this stuff isn’t support by Microsoft, and can get you in much deeper trouble. I am putting it here with the understanding that you know enough not to get yourself in trouble – if I were reading this, I would definitely be very reluctant to try any of it on my own…

    EXCEPT: Robert, if you have a System State backup, made with BackupExec just before the shutdown, and the Active Directory in that backup would be acceptable (ie no worries about losing the changes made to AD since the shutdown), then I would defintely perform an authoratitive restore of the AD on the main system. Look up the procedure (booting into AD restore mode, how to reset the DS Restore mode password (if required), then entering the command line command on making the restore authoritative afterwards…) and you should be good to go. However, it’s possible that doing the restore will remove the problems, but it may not remove the “signal” to the server that all is not well – read on…

    HOWEVER: if, like me, you don’t have a good system state backup of the main domain controller, and you need to figure out why the server is starting with the netlogon service in a Paused state, here’s the scoop on my situation…

    It took them a while to realize that the server was in a USN Rollback situation. Look for Event ID 2103 errors (on eafter each server restart)in the DS Event log (I think it’s the DS one anyway). It basically warns you that someone screwed up and restored the AD using a non-supported method (like from an image backup, for example). Don’t walk away from this saying “I never restored the AD, in ANY way!”. I hadn’t either. The server THOUGHT it had been and you can’t debate with a server.

    What you need to do is make sure the server’s AD communications and replication are working as they should, using tools like dssite.msc, repadmin, dcdiag, etc. If you are confident that AD secure communication is working error-free (using tools above – you’ll have to research their use – I am no AD expert), then you can do the following.

    HERE’S THE DANGEROUS PART – IF YOU DO THIS WHEN THE SERVER ISN’T READY (IE PROBLEMS RECTIFIED) YOU **WILL** MESS THINGS UP WORSE THAN THEY ARE! (By the way, if you are manually starting your netlogon service, you better also be starting manually your Windows Time service too – it’s needed for the following.) A personal recommendation at this time would be to do a System State backup of the server before proceeding. Then, use dcpromo to remove the AD from any other DCs on your network. If you can do this error-free, and you want to be even more sure about the health of your server, you can re-promote one to be sure it will do all that properly too, then demote it again. REMEMBER, all this is to prove that the main server is working properly, so we can remove the error condition that is stopping it from knowing it’s OK and starting netlogon (and Windows Time) from starting upon reboot. If you CAN’T properly demote the other DCs, you will need to forceremove the AD, and then use ndtsutil (in Support Tools) to remove the entries for the removed server from the AD, otherwise it will never be able to re-install the AD. If you can’t properly promote a server to be a 2nd DC, you still have issues to straighten out before continuing.

    Anyway, to stop the server from “thinking” it’s messed up, you need to remove the following from the registry:
    HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNTDSParameters “Dsa Not Writable”=dword:00000004

    Remove the entire value “Dsa Not Writable”, not just the “4”

    I hope this has provided some useful information – I can’t stress enough how dangerous all this is, but desperate times call for desperate measures. Just be careful, becasue I’m not qualified to advise you on this topic, and I may have left out some important stuff.

    The following link is to another thread I posted to on the topic. Sorry if you need to subscribe to this site to read it (not sure if you do) – I subscribe and it’s one of the nest investments I’ve ever made.

    http://www.smallbizserver.net/Forums/tabid/53/forumid/11/tpage/1/view/topic/postid/83405/Default.aspx#83441

  32. Paul

    “First we need to demote the 3rd DC as we added the machine after the last system state backup, right?”

    That would probably make things easier yes.

  33. Robert

    Paul,
    thanks for the info. We give it a try. First we need to demote the 3rd DC as we added the machine after the last system state backup, right?

    We are still wondering how the USN problem could happen, as we only shutted down the SBS Server for 20 hrs. and started again. Weird.

  34. Paul

    Hi Robert, if you were to perform an authoritative restore of your known good SBS backup that would likely resolve the USN rollback condition, however you would lose any changes made to the Active Directory since that backup was made (eg new users created, passwords changed, etc etc).

  35. Robert

    One note:
    We do have a system state backup of the SBS Server from before we shut down the server. Can we resolve the issue if we restore the system state from before the mess?

    We use BackupExec 10d.

  36. Robert

    Well,

    were exactly in the same situation.
    We got a SBS Server and 2 other DCs currently.
    We had a broken Raid-1 volume and the server was down for 20 hours. This seemed to be enough to cause the USN problem.
    Note: There was no old image restored to the drive, we simply added another disk to resync the raid and restarted exactly the same system.
    As it is on a SBS server, we do not know what to do.
    We restarted the replication by forcing a repadmin -DISABLE_INBOUND_REPL DISABLE_OUTBOUND_REPL
    and restart netlogon every time the server reboots (as a workaround for now)

    So the good question is:
    – How do I move the Exchange, FSMO etc. from the SBS away to demote the server? (There isn’t a clear document at the MS knowledgebase regarding USN Rollback on SBS)

  37. Avatar photo
    Paul

    Well Rob if you end up getting a good solution from them be sure to share it around. At the moment I have stopped trying to work out a solution on my own.

  38. Rob P

    Funny, I am in an identical situation.

    Microsoft has spent about 30 hours on it so far. Tomorrow morning, a Directory Services specialist is going to take another crack at resolving the USN status, but they are probably going to demote the SBS server and dcpromo it again.

    Much of the time spent on this up to now was used up after a new MS tech got on board, doing the same things and looking in the same places as the 5 before him. Eventually they get frustrated, escalate it, and we start again.

    Don’t get me wrong – they all seem knowledgable enough. It’s just a long process getting to where we need to be. The last call, I wasted 8 hours before they actually started to implement a plan suggested by the tech the previous night.

    Hopefully, tomorrow, they will start where we left off at the end of last call.

    On the other hand, they did manage to get a second DC to promote in this domain, and also fixed a serious VSS issue that was prventing the System State from being backed up.

    Boy, this stuff works great, when it works. Otherwise, even the experts seem lost. If Microsoft can’t fix this, who the heck can??

  39. Avatar photo
    Paul

    Hi Matt, what you’ve got there is a good example of why not to put Exchange on a DC πŸ˜‰

    At any rate, I’ve not yet determined a method for convincing a server that it is no longer in a USN rollback state, even if that server is the only DC for the domain.

    Running an authoritative restore doesn’t fix it unless the backup you are using is from before the USN rollback condition occuring.

    So you’re right, having to swing your Exchange users across to a temp server while you sort out the USN rollback situation is a pain, but in the short term is likely your only option. In an Exchange 2003 organisation it is not that complex just a little time consuming.

  40. Matt

    I should have added that running the REPADMIN /SHOWUTDVEC as per the Caplock article does indeed reveal that the USN for the Exchange server is higher on the F&P server than itself.

    Not sure why this is as I previously mentioned I have DCPROMO’d the server the F&P server down and promoted it again.

    Is there anyway to force a synch of the USN’s on both DC’s for the Exchange server?

    I don’t really fancy MS suggestion to introduce another Exchange server, migrate the users and then rebuild the existing Exchange server. I don’t really class this as a working solution more like a nightmare!!!

  41. Matt

    I have two DC’s one server is F&P with DC and a second server with Exchange and DC.

    Each time the Exchange box is started the Netlogon service is Paused on startup with the error 2103 in the DS Event Log: The Active Directory database has been restored using an unsupported restoration procedure.

    Active Directory will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.

    Manually start the service the Netlogon on the Exchange server and everything works fine including AD replication with the F&P server. Repadmin shows no errors.

    Have tried an Authorative restore to the Exchange server but still receive the exact same problems.

    Have even DCPROMO’d the F&P server so that the Exchange box is the sole DC in the domain but still get the same error.

    Any help would be greatly appreciated.

  42. Paul

    Update: I joined a Windows Server 2003 R2 server to an SBS domain, made it a DC, used VM snapshots to cause it to fall into USN Rollback condition, and now I’ve got it isolated in a VM network on its own to test this situation. So far I’ve seized all the FSMO roles to it and cleaned the SBS server out of the directory but the Win2K3 DC still wants to disable replication and pause Netlogon.

    I realise in your cases the SBS server is the one that keeps playing up but I think this would show it isn’t exclusive to SBS; any Windows Server 2003 DC might remain in USN Rollback condition even when it is the only DC left.

    So…. testing continues! πŸ™‚

  43. Joe

    I’m in exactly the same boat as Cobra. If you manage to figure out how to convince the SBS server that it’s no longer in a rollback state after getting rid of the second DC and metadata, well, that would be most excellent πŸ™‚ I’m sure there are others lurking here too with the same problem.

  44. Avatar photo
    Paul

    Its a tough one and I don’t have an answer for you right now. You’ve got me interested though and I’d like to try it in my lab, which will take probably a week or so. If I can replicate the issue and find a solution I’ll make a blog post about it so you could subscribe to my RSS feed and watch for it over the next week or two if you like.

  45. Cobra

    Yes, I went through and did a meta data cleanup and there was no trace of the Second DC. That is what makes me think there has to be some trigger with AD, maybe the database, or a registry entry that it doesn’t reset.. Once tripped the system will not let go that it wasn’t quote unquote recoverd correctly, even though there is nothing left to cause it difficulty…

  46. Paul

    Thats a tough one. A lone DC should be able to be convinced it no longer has to worry about USN rollback conditions.

    When you remove your second DC are you also ensuring all of the metadata has been removed from the directory?

  47. Cobra

    Hi Paul, Thanks for your response. Yes you are clear the event still shows up and the Netlogon is paused even when the SBS server is the only DC. When I have added back the second one, the USN’s match and the replication will work fine as long as I go in and turn off the DISABLE_OUTBOUND_REPL and inbound and resume the netlogon. Everything appears to work fine and the replication works. However, if I restart it the event shows up again replication is disabled whether or not there is another DC.. Strange.. I have also tried setting the SBS to have an Authoritative restore, which should just update the records, but this doesn’t change anything..
    Unfortuantely I do not have a good System State Backup with AD of this server, I inherited this problem.
    Thanks for any help.

  48. Avatar photo
    Paul

    That sounds like a real mess. Just so I’m clear, when you demote your other DC and the SBS server is the only DC the SBS server still shows evidence of a USN rollback condition?

    Do you have the option of doing an authoritative restore of the SBS server from prior to the USN rollback condition appearing?

  49. Cobra

    I have this exact problem with a DC on an Exchange system. I continually am amazed how things are listed as “Not Recommended” and at one time it was recommended. It is also sold this way as in my case. I have a server running “Small Business Server” so it has exchange and has the USN Rollback problem. I have another W2k3 server running as a Domain Controller as well. I have no options to move exchange, IE because of the SBS server, I have to keep it a DC, IE because of SBS server, and if not SBS will continually shutdown because it is not the PDC emulator and a DC. (Which is not cool, but that is the way it was built.) I demoted and repromoted the other server thinking that at that point they would match, but everytime the server is restarted even when it was the only DC it comes up with the error in the Event Log and pauses the Netlogon service. I can unpause it and reenable replication and everything appears to be working till the next restart.

    Since it was the sole DC for a short while and when I repromoted the other server I know that it does replicate, because the repromoted DC populated completely from the malfunctioning one. I am thinking some check is occuring causing this error to still popup eventhough there is no problem any more.

    Is there any know triggers besides the USN values for this? Anyway to reset this, without a flat demotion and promotion. I believe this is a workaround not a fix, and am not to happy that this is the recommended solution for this problem. Especially since it is not recommended to demote an Exchange server. LOL.

    Thanks for any help.

  50. Capslock Assassin

    I would personally only do it as a last resort. Option #1 would be a better result overall, but if that is impossible then Option #2 is about all I can suggest.

    There are definitely some serious implications – any non-replicated Active Directory data will be lost when the DC is demoted for one thing. In addition, if there were more than two DCs in the environment then Option #2 would simply not be practical anyway.

    But I would certainly expect that once the other DC was demoted, the Exchange/DC server would no longer detect the USN Rollback condition and would permit itself to function normally again.

  51. Gumshoe

    If there is a USN rollback condition, do you think it’s wise to suggest demoting the other DC (the non-Exchange one) instead? Isn’t that the DC with the higher USN number? What you’re suggesting is to go in the opposite direction; instead of demoting the DC with the out-of-sync USN, demote the DC that considers itself correct.

    I think there may be some serious implications of removing/demoting the only “working” domain controller, even if net login resumes from pause on the machine with the problem.

    Does the primary DC automatically recover if the second DC – the one currently considered clean – is removed?

  52. Capslock Assassin

    If NetLogon resumes okay and the “repadmin /options” command shows that neither inbound or outbound replication is disabled, and everything works fine from there, then I am not 100% sure you have a USN Rollback.

    Have you analysed the “repadmin /showutdvec” output for each DC as I mention in the article?

    Can you provide any more information on how this problem came about in the first place? Did you do an improper restore of one of your domain controllers?

    Regardless, if you wish to try to fix it then I see two options:

    1) Migrate Exchange to another server temporarily so that you can uninstall it from the DC while it is demoted and promoted again. You might be able to use another server or even VMWare for this.

    2) Demote the other DC (the non-Exchange one) instead.

  53. Richard Foor

    Thanks for the reply. That is what we assumed. We do have two domain controllers. One GC does not have exchange, and then the Second one which does hold exchange. The one that is having the paused Netlogon is the one with the Exchange installed. Just to clarify, everytime we restart that machine the netlogon services comes up paused with the error. However, by simply resumeing the service it start and the replication starts to work normally. Any ideas of what can be done about this without demoteing.

  54. Capslock Assassin

    If the NetLogon service starts, you are able to restart replication, and then replication is occuring without the problem coming back, then you may not actually have a USN Rollback condition. If your Exchange server is your *only* Domain Controller then you certainly do not have a USN Rollback condition, as it requires at least two DC’s for it to occur.

    With regards to your question about Exchange, while Exchange on Domain Controllers is a supported (but not recommended) configuration, changing the state of the server after Exchange has been installed is not a supported operation.

    Refer to the information posted here:
    http://blogs.brnets.com/michael/archive/2005/01/24/319.aspx

    If you do in fact have a USN Rollback condition and you wish to try to resolve it by demoting and re-promoting the Domain Controller, then you will need to migrate your Exchange to another server first and then remove Exchange from the DC.

    Hope that helps.

  55. Richard Foor

    We have encountered this problem exactly, however; we able to work around it by manually starting the Net Logon server. The server then replicate. The reason we have been aprehensive about following these instructions is because this active directory server is also and Exchange server. How does that effect these instructions.

Leave a Reply