Ignition Redundancy and Alarm Sticking Issues

Hey all,

Can someone take a look at these two log files that I’ve included here:
[attachment=1]wrapper(primary).log[/attachment]
[attachment=0]wrapper(backup).log[/attachment]

The last few days I’ve come in to work to find that the primary server is stuck in a state of trying to reconnect to the backup server and a LOT of alarms are stuck. I cannot clear any of the alarms even by disabling the tags and some alarms will only clear if I restart the backup server.

I thought at first that it may be a memory issue with the backup server since it was riding pretty close to the “installed” memory (these servers are virtual servers) so I had our MIS department throw another Gig of ram on the backup server yesterday but the same thing happened last night.

These servers are virtual servers in a cloud environment, everything is connected via Gig Ethernet and I’ve been running pings on the servers without any communication issues so I don’t think it’s a network problem. This problem just started this week so I suppose it’s possible there is a problem on the cloud server end, maybe they changed something, but I have no idea what. I was hoping someone could give me some more information from the log files.

The most annoying thing is all of the stuck alarms that I can’t clear. I at least need a way to clear out the alarms so the notification stuff starts working correctly again.

Running Ignition version: 7.6.3 (b2013090513)

Any information would be greatly appreciated. Thanks!

I’ve shut down the backup server because that seems to be the issue. If only the primary server is running then everything works fine. Every time I start up the backup server this error floods the backup server console. Is something corrupted?

[code]
Time Logger Message
ERROR 11:29:19 AM StoreAndForward.IgnitionDB.HSQLDataStore Error deserializing data from data store.

java.io.InvalidClassException: com.inductiveautomation.ignition.common.alarming.config.CommonAlarmProperties; local class incompatible: stream classdesc serialVersionUID = -6984932928203482863, local class serialVersionUID = 4708432537456417261
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:562)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1582)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at java.util.HashMap.readObject(HashMap.java:1029)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at com.inductiveautomation.ignition.gateway.cluster.ClusterUtil.deserializeObject(ClusterUtil.java:60)
at com.inductiveautomation.ignition.gateway.cluster.ClusterUtil.deserializeObject(ClusterUtil.java:52)
at com.inductiveautomation.ignition.gateway.history.stores.AbstractDatasourceStore.deserializeObject(AbstractDatasourceStore.java:476)
at com.inductiveautomation.ignition.gateway.history.stores.AbstractDatasourceStore.loadTransactions(AbstractDatasourceStore.java:535)
at com.inductiveautomation.ignition.gateway.history.stores.AbstractDatasourceStore.syncdTakeNext(AbstractDatasourceStore.java:491)
at com.inductiveautomation.ignition.gateway.history.stores.AbstractStore.takeNext(AbstractStore.java:195)
at com.inductiveautomation.ignition.gateway.history.stores.MultiStageStore.syncdTakeNext(MultiStageStore.java:170)
at com.inductiveautomation.ignition.gateway.history.stores.AbstractStore.takeNext(AbstractStore.java:195)
at com.inductiveautomation.ignition.gateway.history.forwarders.RedundancyAwareForwarder.synchedTakeNext(RedundancyAwareForwarder.java:103)
at com.inductiveautomation.ignition.gateway.history.forwarders.ForwarderThread.run(ForwarderThread.java:128)[/code]

Hi,

I’m not sure about the alarms at this point, but the errors on the backup are flooding the logs and keeping you from getting useful information. To get rid of those, you’ll need to clear the store and forward caches while the gateway is shut down. These are under “{INSTALLDIR}\data\datacache”. You can simply delete/move the folders in there.

Basically, I think this was caused by data stored before an upgrade. Occasionally these types of data incompatibilities pop up, even though we try to protect against them. With important data, we can manually fix it, but if you’re not missing anything, just move the caches.

After that we can take a look at the alarms. While you have the backup shut down, also delete the “.alarms_*” file from the data directory.

Regards,

Hey Colby,

Yeah I think I got it taken care of now. I upgraded the servers on Sunday so I spent some time with them and deleted the cache on the backup and that seemed to take care of it.

I’m still getting some weird redundancy connection error messages once in a while for some reason even though there doesn’t appear to be any network issues. That has happened to me in the past due to memory issues (Java garbage collection), but both servers should have plenty of memory now.

I’ll double check tomorrow that everything is still looking good and I’ll post the backup logs again. Maybe you can see something in there for the redundancy stuff. Thanks!

Hey Colby,

I checked everything this morning and the alarm issue does seem to be taken care of. I think between the cache delete and the Ignition upgrade everything was taken care of with respect to the alarming.

I’m still getting phantom redundancy connection errors though, interestingly it seems I’m getting many more connection errors on the backup than on the primary. I’ll post the logs so you can look at it. It doesn’t seem to be causing any issues right now, the only concern I have is that it looks like the OEE production model is becoming active on the backup server when the connection issues happen and that has caused problems in the past with inaccurate down time reporting and such.

[attachment=0]wrapper(primary).log[/attachment][attachment=1]wrapper(backup).log[/attachment]

Doug,

I see it has been a while. Is this still relevant?

Hey Anna,

Nope, these issues seem to have been fixed with the last couple of updates. I haven’t had a sticking alarm issue for a little while now. Thanks!