I'm reaching out to the kind community for some suggestions to an error i am encountering with a Management Server.
on friday afternoon i encountered event id 1230
'New configuration cannot be loaded, the error is Catastrophic failure(0x8000FFFF). Management group "MGNAME"
on one of my 4 management servers which caused the server to go grey. as this was late friday i stopped the management server and allowed the others to run over the weekend.
This morning i discovered another of my management servers in the same state.
I have recently added a cluster disk management pack and enabled some SNMP Probe Monitoring and as i presumed this was some type of overload i subsequently removed the additional management packs and tried to clear the cache of the affected management
servers this cleared one of the servers but the original problem still remained on the original management server
i then tried deleting the management server and re-installing it, when i removed the management server the second management server then went grey.
once i re-installed the first management server and reconnected the same problem returned
as i had uninstalled the management server it currently has no agents assigned and is not participating in any of the resource pools i tried again clearing the cache but this causes the same error to return
the error seems to come on its own before the log is filled with event 5500 state change messages and 31410 group calculation messages
followed by a number of 4503 A module reported an error 0x80FF0036 from a callback which was running as part of rule 'random rule' followed by its accompanying 1103
and lastly a heartbeat failure for a deleted resource pool.
i cant see how the management server could be being over worked now but nothing is giving me an idea of what may be causing the issue, if i leave this management server ina grey state al others run OK and i dont seem to be having issue collecting perf
or alerting on other elements
with another management server turning grey when this goes off i assume it must be something moving with a process that is causing the issue
i intend tomorrow to try again at deleting the management server but this time purge the database prior to the re-install
however any suggestions would be greatly appreciated
Richard Scott