Hi,
a customer of mine has a problem with the cluster monitoring in SCOM.
They are running SQL Server on a Windows Server servicing about 20 database instances and 5 DTC instances. The number of cluster disks in the cluster is about 155.
Configuration:
Server CPU: Intel xeon E5-2667 v3, sockets:2, Cores 16, logical processors 32
Server Memory: 768 Gb
Server OS: Windows Server 2012 R2 Standard
SQL Server: Microsoft SQL Server Enterprise (64-bit), Version: 11.0.6020.0
Operation Manager 2012 R2 7.1.10226.1239 with U11 rollup update
Management packs involved:
Windows Server 2012 Cluster Management Library; version: 6.0.7291.0
Windows Cluster Management Monitoring; version: 6.0.7291.0
Windows Server 2012 R2 Cluster Management Library; version: 6.0.7291.0
Windows Server 2012 R2 Cluster Management Monitoring; version: 6.0.7291.0
Windows Cluster Management Library; version: 6.0.7291.0
Windows Server Cluster Disks Monitoring; version: 6.0.7316.0
Windows Cluster Library; version: 7.0.8433.0
Alert Detail example:
The output data were found, but these have been removed because in the event policy for the process that started at 10:19:10 errors are found.
The policy expression Exit Code:
[^ 0] +
corresponds to the following output:
3
Command is executed: "C: \ Windows \ system32 \ cscript.exe" / nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "Cluster Disk Monitoring" "server.infra.local""CLU02"
Working folder: C: \ Program Files \ Microsoft Monitoring Agent \ Agent \ Health Service State \ Monitoring Host Temporary Files 13836 \ 55674 \
This affects one or more workflows.
Workflow Name: Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.FreeSpaceMB
Copy Name: Cluster Disk 92 _ \\ \ Volume {6d883905-fff8-452f-8eea-9ecb4606c784}?
Instance ID: {-B92-31F2-DEFA-5F89441DC5C5}
Management Group: SCOM
We applied overrides as described in https://blogs.technet.microsoft.com/kevinholman/2013/02/21/healthservice-restarts-still-a-challenge-in-opsmgr-2012. The have some effect but the problem is not gone.
This MP never did his job.
There are multiple clusters monitored and the problem is on all clusters
The problem has nothing to do with backup, maintenance or what so ever
I can’t figure out what the problem could be.
thanks!