SCOM 2012 SP1
We recently had multiple issues occur on our Server 2008 R2 Hyper-V servers in a cluster that brought down our cluster services and in one instance led to a BSOD on one of the hosts.
The cluster MP queried the cluster services deadlocked or timeout the cluster service on the host causing VM's to failover. It happened twice! and event errors I show at end of this post. Subsequently, we removed SCOM agent from our HPV servers and they are working fine.
We wish to reinstall the SCOM agent to enable monitoring but wish to disable the cluster component from integrating the servers. We wish to keep the MP because its also monitoring our SQL cluster successfully. We will need to somehow disable cluster monitoring while the agent is being reinstalled so the MGT pack information is not transferred to the HPV servers but then re-enable it so it continues to monitor our other cluster resources.
I have version 6.0.7063.0 of Windows Cluster Management Library and Monitoring installed.
Recommendations?
The Second question is also related to SCOM interrogation of our HPV servers. We want to change the frequency that the NIC cards are queried. I think default is around every 5 minutes. Where can I find this and change so that they are queried every say 1/2 hour?
--------------------------------------------------------------------------------------------------------------------------------------
Log Name: Operations Manager
Source: Health Service Modules
Date: 5/8/2014 4:36:34 PM
Event ID: 10409
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: Server1 of cluster
Description:
Object enumeration failed
Query: 'SELECT Name, State FROM MSCLUSTER_Resource'
HRESULT: 0x80071716
Details: The call to the cluster resource DLL timed out.
One or more workflows were affected by this.
Workflow name: many
Instance name: many
Instance ID: many
Management group: SCOM GROUP NAME
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 5/8/2014 4:36:34 PM
Event ID: 1146
Task Category: Resource Control Manager
Level: Critical
Keywords:
User: SYSTEM
Computer: SERVER 1 of Cluster
Description:
The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.
And at the same time, all the VMs are failed. Here's the first one:
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 5/8/2014 4:36:34 PM
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: SERVER 1 of Cluster
Description:
Cluster resource 'SCVMM SERVER-SQL-NAME (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor