Hi There,
Heres some background on a problem I'm trying to mitigate with SCOM 2010.
Got a problem with a web server, occasionally the web app causes the CPU to max out and stay there. We can't pick when it happens and often it happens out of hours when everyone is asleep so we don't know that the CPU is running maxed out until the morning. At this stage the only way to resolve CPU utilisation is to do iisreset on the affected server.
We're still trying to figure out what is causing the problem but until then it would be really good to have SCOM detect the high CPU on the server and restart IIS for me. That way the site performance recovers quicker because we wouldn't need someone to notice the alert and issue the iisreset.
I've been tooling around in SCOM trying to figure out how I can have SCOM restart IIS on the affected server when the CPU spikes but I'm just not sure how to do it.
I only want this IIS Restart Recovery action to only affect the server that has been playing up, rather than affecting all servers that run IIS.
It would be good to have SCOM detect that the IIS service (w3wp) is the service maxing out the CPU and to restart it on the affected server. But I'd also accept SCOM simply detecting high CPU (> 95%) on the CPU and restarting the IIS service because nothing else on the server has been seen to cause the CPU to max out like IIS does.
Suggestions very much appreciated.