SQL Server Cluster Instance failing over from Primary Node to Secondary with no specific error

published: 2014-11-20 06:38

We have encountered an unusual problem with a clustered SQL Server instance running SQL Server 2008 Service Pack 3. The issue was causing the SQL Server instance to frequently fail over from the primary node to the passive node. While our initial response was to failback to the primary node in non-business hours, this incident re-occurred twice in a week and it started affecting the business at our customer site.

Our investigations included a review of the Windows Event logs, which showed very generic errors (1146 and 1230). These errors did not indicate anything specific, so we discussed with the peers and undertook a full system audit, but this did not identify any specific cause of the issue. As the issue was causing impact to our customer we escalated to Microsoft with high priority incident, and then coordinated with the Microsoft specialist support team to replicate the environment issue.

Working with Microsoft we have found a bug in the Windows Cluster level that was causing the failover with the generic error. Within a few days Microsoft provided patches to be applied, with advice from Microsoft to apply the patches ASAP. After raising an emergency change with the customer we applied the patches on all 4 cluster nodes.

Since applying the cluster patches the environment is stabilised and no unexplained cluster failovers have occurred. Our customer was of course very appreciative for our efforts.

Environment Details

SQL Server Version: SQL Server 2008
SQL Server Edition: Enterprise Edition
Service Pack: Service Pack 3
Operating System: Windows Server 2008
OS Edition: Enterprise Edition
OS Service Pack: Service Pack -2
Cluster Configuration: 4 Node cluster (3 active+1 passive)

Patch Details

The following patches were applied:

http://blogs.technet.com/b/yongrhee/archive/2011/06/12/list-of-failover-cluster-related-hotfixes-post-service-pack-2-for-windows-server-2008-sp2.aspx

Sunil Singh Thakur
Sunil is a senior DBA in the RockSolid SQL DBAaaS team providing support, advice and project delivery for all areas of SQL Server. Sunil has been part of the RockSolid SQL team since 2008.

RockSolid Automation Platform