Quantcast
Channel: Operations Manager - General forum
Viewing all articles
Browse latest Browse all 11941

error events 31551 and31552 only during the night

$
0
0

events error 31551 and 31552 started show up for some time every night 00:00 to 08:00 “SQL exception timeout expired... timeout period elapsed prior to completion... “

Also 16 events warning 2115 showed up in between the 31551-315522.

At all time we have many warnings 29202,

If not closely monitored our rms health service stops working ... "Staled" state

No backup runs on SQL db. DBs are backed up by sql on another lun that is backed up by Tivoli tsm.

No antivirus on Db.

The SCOM db have been moved 3 weeks ago on from SQL 2k5 to SQL 2008 r2 sp1.

My SQL admin found out that the SQL is generating lots of i/o (not all on disk, some in memory...) over 1 Tb of data  every night for some time... During the day DW db size is 74062,56 MB

We moved SQL data and log folders on different  luns, DW on its own lun. We moved the Datawarehouse on another lun, it made no difference.

The datawarehouse lun has long latency during the night, up to 15 sec.

No new MP installed in the past 8 months, only few overrides and disables recently

Our scom setup is 2007r2 cu7, rms on 2k3sp2, ms (acs) 2k3sp2 and ms 2k8sp2, 1 gateway in a dmz on 2k8 sp1 . Setup dated 2008 and 2011, all virtual Esx and approx. 150 agents.

From what we have read it seems to be related to performance, however we are unable to find root cause of issue.

Can you please help us determine what causes these error events to show up at night only...

Summary of error events follows:

Event Source: Health Service Modules
Event Category: Data Warehouse
Event ID: 31551
Date:  04/06/2014
Time:  08:00:31
User:  N/A
Computer: MMTRLPALPINF033
Description:
Failed to store data in the Data Warehouse. The operation will be retried. Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
One or more workflows were affected by this. 
Workflow name: Microsoft.SystemCenter.DataWarehouse.Synchronization.Relationship
Instance name: MMTRLPALPINF033.Prod.MJQ.Local
Instance ID: {0F390614-0505-F7C2-49A3-128862BF520E

Workflow name can also be  "Microsoft.SystemCenter.DataWarehouse.Synchronization.ManagedEntity "

One 31551 with following description

Failed to store data in the Data Warehouse.
The operation will be retried. Exception 'SqlException':
A transport-level error has occurred when receiving results from the server.
 (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)

Event 31552

vent Type: Error
Event Source: Health Service Modules
Event Category: Data Warehouse
Event ID: 31552
Date:  04/06/2014
Time:  07:08:45
User:  N/A
Computer: MMTRLPALPINF033
Description:
Failed to store data in the Data Warehouse. Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
One or more workflows were affected by this. 
Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
Instance name: Performance data set
Instance ID: {3EC423BE-5714-81B5-AA64-29A3E158A7B6}
Management group: OPSMGR_P

Event Type: Error
Event Source: Health Service Modules
Event Category: Data Warehouse
Management group: OPSMG

The 31552 was mostly with Instance name: Performance data set
Some with Instance name: event  data set or state data set and some with the RMS server.


Viewing all articles
Browse latest Browse all 11941

Trending Articles