Datacenter Disaster & DACP Issue

Datacenter Disaster & DACP Issue

Scenario:
A DAG with two mailbox servers part of one site having 20 databases in total with 10 DBs mounted in each server. The FSW is a VM that is in a different site. All DAGs in the Exchange Organization were enabled with DAC mode though few of them are not stretched across sites to maintain consistency. Overnight there was a sudden water leakage in the AC duct which caused power outage that took complete datacenter down. Once the power issue was fixed & on the course of investigation, it was noticed one of the Exchange server could not be made online as the server HDD was crashed & bacame irrepairable.

Though the second server was brought online, the databases could not be mounted because of DAC mode. As the server was restarted, DACP bit became zero. It cannot meet any of the below conditions to make its DACP bit to 1 in order to mount the databases.
1)It cannot connect to all mailbox servers in the started mailbox servers list o DAG
2)It cannot talk to a mailbox server that is part of DAG which has DACP bit 1

Now to fix the issue, we thougt about couple of solutions
* To disable DAC mode in the DAG which will avoid this DACP bit problem
* To mark the server that is currently offline as stopped in DAG. But as the DAG had required no of votes(1 mailbox server + FWS), there was no problem in maintaining quorum. Also we were not sure how long it was going to take to bring the second server Online.

Solution:
The easy solution without making any of these changes is to change the DACP bit to 1 on the surviving mailbox server manually with below command.
start-databaseavailabliitygroup DAGNAME -MailboxServer SurvivingMailboxServerName

The moment above command was issued, it changed the DACP bit to 1 & all 10 Databases that were mounted on the server before Outage got mounted successfully. We were out of trouble 50% now. But for the other 10 databses which were passive before Outage, the copy queue length was showing 9223372036854775766. This is a builtin safety mechanism in Exchange & this is by design. This is well explained in the below technet blog.

http://blogs.technet.com/b/timmcmic/archive/2012/05/30/exchange-2010-the-mystery-of-the-9223372036854775766-copy-queue.aspx

As we were sure that both the servers went down at the same time when power Outage happened, we were sure that there cannot be any log shipping between the servers & so logically there wont be any data loss. So we issued the below command which mounted remaining 10 databses in the surviving mailbox server & we were out of trouble 100%

Move-ActiveMailboxDatabase DBName -ActivateOnServer SurvivingMBXServerName -SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipLagChecks -MountDialOverride:BESTEFFORT

About The Author

Related posts

Leave a Reply

Your email address will not be published. Required fields are marked *