I have just returned to checking my SCOM monitoring after a few days away on vacation (i.e. I havent made any changes to the actual management or SQL servers over that time).
Everything has been working fine until this time but I noticed both management servers in a grey state and a lack of alerts in the console.
I tried restarting the health service on both servers but after a little while I got the grey state again.
I tried a reboot of both the SQL server and the management servers. A few alerts started appearing but again after a short while maybe 20 minutes got grey state again. At one point I am sure one of the management servers became green again but then went off. Some alerts then come through but they seem delayed to me.
Without quite knowing whats going on I was wondering if I had some sort of config churn first of all but that dosent seem to be the case.
I am seeing a number of 2115 events (not loads just every few minutes) showing response times of 3660/4140 seconds etc
The only workflows involved seem to be -
Workflow Id : Microsoft.SystemCenter.CollectEventData
Workflow Id : Microsoft.SystemCenter.CollectSignatureData
Workflow Id : Microsoft.SystemCenter.CollectPublishedEntityState
Workflow Id : Microsoft.SystemCenter.CollectAlerts
Also a few 21402 saying process had to be terminated because it ran past its timeout
So I seem to have some sort of performance issue but not a clue where to start looking.
The network seems OK and the databases dont seem to be full and like I say I havent actually changed anything