Redundancy

I’m trying to set up redundancy. I called but everyone was busy, I’ll probably get a faster response in the forum.

I’ve had FactorySQL running on a computer for a few months now. I just enabled redundancy and everything went just as the help file said it would, no problems. Then I went to enable redundancy on the new computer and that’s when all hell broke loose. It didn’t act the same way on the new computer, and it disconnected the first machine from the service. Now I can’t get both computers running the project at the same time. If it’s running on one, it disconnects the other.

Sounds like you might be trying to connect both frontends to the same service. This is understandable, since when you connect the frontend to a backup system, it asks if you want to actually connect to the master, and then redirects you. However, a service can only have 1 frontend connected at a time, so it will disconnect the other one.

Regards,

Yup, that was it. Thanks.

Is it common for the front-end to ask if you’d like to connect to the master more than one time per instance? I haven’t closed the front-end on either the master or backup servers. This morning when I signed into each, FSQL was asking if I’d like to connect to the master. The part that confuses me is why the master would be asking if I’d like to connect to the master.

Hmm, no, it shouldn’t- unless perhaps the master changed and then changed back in the middle of the night.

There’s the possibility that it’s popping up when it shouldn’t, but we haven’t had any reports so far of this. So keep an eye out, and if it seems to be popping up even when the master hasn’t changed, let us know.

Regards,

Does FSQL create a log when it changes servers?

Yes, you should see informational messaged logged in the log viewer (Help->Log Viewer) each time the redundancy system changes “responsibility”. You should also see a log of these events under Connection->System Status->Redundancy, though these won’t persist between restarts (the messages in the log viewer will, however).

Regards,

On the master server I’m seeing weird messages in the log viewer under “RedundancySystem” like:
2 nodes removed due to inactivity
Responsibility has changed to: Backup
Project state has been changed to: Ready
Responsibility has changed to: Master_Negotiated
Project state has been changed to: Active
3 nodes removed due to inactivity
1 node removed due to inactivity

And then one with really long details I won’t type here, but here’s the message:
Error in responsibility manager: Error executing update query: ERROR[HYT00][Microsoft[ODBC SQL Server Driver]Timeout expired

I have a feeling these messages are why when I get here in the morning and check on FSQL, sometimes the front-end is asking me if I want to connect to the master service even though I haven’t closed the front-end.

Indeed, it looks like you might be running into some DB connection issues that are causing other problems.

First, the root of everything is almost certainly that “DB timeout” error. You’re using ODBC to connect to SQL Server? While this is generally acceptable, the native driver (as in FactorySQL’s native driver, on the DB connection screen) has better performance, and I’d probably recommend it for things such as redundancy and sqltags. So, I would try to get a native driver connection set up pointing to the same DB, and then switch over redundancy to use that.

If it’s not possible, verify that connection pooling is enabled on the ODBC driver. Go to Windows’ ODBC configuration, go to the Connection Pooling tab, and then double click the driver that the DSN is using. Verify or enable connection pooling.

The “removed nodes” messages just mean that there were entries in the redundancy table that hadn’t been updated for some amount of time. When this happens, they are removed. The nodes re-insert their information, and a calculation is made as to who should be the master.

There are other potential causes of DB timeouts as well, such as table locking (if the server performs automatic backups/archiving) or just heavy load overall, but I would start by trying the native connection.

Regards,

I’m not sure if you meant I should use a native connection for all my connections or just the redundancy system, so I just changed it for the redundant stuff. We’ll see if the problem occurs tonight.

That should be fine.

We’re still having problems with redundancy. It’s ironic that none of the problems we’re currently experiencing existed before setting up redundancy.

Right now I’m ready to disable redundancy because it’s creating way too many problems.

From the log viewer:
Responsibility has changed to: Backup
Project state has been changed to: Ready
1 nodes removed due to inactivity

And then if error messages like that appear, nothing works right. And again, things worked perfectly fine before we enabled redundancy.

I set the data connection type for the redundancy system to native a while back, that never actually helped. Almost every time I remotely connect to the FactorySQL computers they have error messages about not being connected to the master. I’ve just gotten so used to clicking “OK” that I don’t even read it anymore.

Do I need to change the NIC to a 1 gig card? I can ping all the computers and they ping back at <1ms, so I don’t see how that would possibly be the problem. In one of your previous posts you mentioned tables locking for tape backup. We are backing up to tape on a nightly basis, but we’re backing up the flat files that MS SQL creates. I don’t know enough about SQL Server to know if it would be locking the tables during that process.

I don’t think it’s necessarily the speed of the network or the load that’s the problem.

One thing that might help troubleshooting a bit would be to use node ranking, so that you knew which server was normally supposed to be master. Set that machine to 0, and set the other one to 1. Then you should be able to always connect to the first machine, and it should always be master.

It won’t help with the database timeouts, but should at least give you a solid point of view as to who the master should be. Then you can just check the logs in the master to see when it is being demoted and for how long, to perhaps correlate it with some database activity.

Regards,

I just enabled redundancy on FactorySQL. It seems to be working but I’m not sure if I setup this right. I have two physical servers (A and B) running a virtual servers each with a mirrored database. I setup another two virtual servers (one in A and the other in B) running factorySQL on both. I setup redundancy on the first FactorySQL (A) pointing to the database on A and the second FactorySQL pointing to the database on B, from the second FactorySQL (B) and connected to the service in the Master which is (A). I don’t know how to establish the cross connection.

  DB (A)		     DB (B)
      |	            X 	         |

FactorySQL (A) ------ FactorySQL (B)