Database Connections "Stuck" Open

Hey guys,

I came in this morning and opened up a client to look at a trend and it would never display the trend, I went into the database connections page and saw that there are 7 database connections open that refuse to close out. I closed all of the open clients and they still would not close. I’m not sure what the problem is here. Has anyone run into this before? (I’m using MSSQL 2005)

I checked the settings in the database and it seems like they are set to only allow a maximum of 5 idle connections but it seems like more are sitting there doing nothing. Not sure what to check next.


EDIT:

I checked the number of open connections on the database from Ignition at it is stuck at 27! Why are all of these database connections getting stuck open?


There are really two possibilities here: 1) The queries are taking FOREVER or 2) Something is happening and the connection isn’t being closed by Ignition (and thus the last query used is being reported. Considering that you’re querying the latest month, and the query is the “value query” which isn’t normally slow, I would say it’s likely #2, but we could probably confirm this with a thread dump.

So, if Ignition is still in this state, go to System>Console>Threads in the configuration section, and click the button to download the thread dump. You can also do this from the GCU. Then, attach it here, or email it to support.

While we haven’t seen this happen “in the wild”, we did recently observe a way this could happen in testing. I just want to confirm that it is indeed what’s happening in your case. If so, it will be fixed in 7.2.9.

Regards,

I have included the thread dump file. After some additional playing and testing I came up with some more information:

  1. When the Ignition service is stopped then all of the connections close at the SQL server (probably obvious)

  2. This only seems to happen while viewing trend pages in “Real Time” mode. If I have the database connection page open and then enter a trend that displays in real time I can see the list of database connections start to get longer and longer and they never close.

  3. Ignition was recently moved from a 32-bit Windows XP machine to a 64-bit Windows Server 2008 R2 virtual server a few days ago, after this move this behavior started to happen. When the server was on the old Windows XP machine I never had a problem with real time trends.

Hopefully this information is helpful, if I find anything else out I will let you know.
thread-dump 20110816.txt (50 KB)

We actually see this with some regularity in one of our history profile/trending combinations. Here is what we see on two comparable gateways (in terms of trending load). Gateway 1 has a local MySQL database being used for the sql tag history. Gateway 2 has a SQL Server database on a different server. Gateway 1 has never had the problem, but Gateway 2 does. I was going to switch Gateway 2 to the same setup as Gateway 1, but I’m a little stubborn and I figured I could find a solution to make it work.

I thought it was related to total records in the table and/or a locking issue. I used the partitioning (1 week to 3 days) to reduce and still have seen the issue. Sometimes it’s fine with 7 days of data in a table, sometimes its hours. When you say queries taking FOREVER, what kind of time do you mean? It sure seems like the issues is speed related, but I doubt that the queries are taking forever. Most of the trends in question are only looking at 2 hours of data.

One thing I have found that seems to prevent the problem from occurring. When running the query in the SQL server profiler it suggested creating a clustered index on t_stamp and tagid. When I create this index (I keep the non-clustered indexes created by ignition), the problem doesn’t occur. It would be nice if this happened automatically.

It would be nice to know definitively know if this is indeed a database performance issue or if something else was causing this to occur.

Well, from the thread dump it looks to be more of a problem inside of Ignition- I suspect the one that we recently discovered. Examining the log files should show some sort of error occurring when those queries start, but the main point is that while there are 7 queries reported, there aren’t 7 threads in the middle of running them.

Now, in regards to Rockman’s post: It’s possible you’re running into a combination of things. On one hand, you might be running into this issue, but it’s more likely that you’re hitting a more common problem of database performance. This is especially likely if changing the index to clustered helped. We have found that a clustered index on the time stamp helps a lot, and have indeed set this up by default for 7.3. We’ve also introduced a few other features to help improve performance- a new value change detection algorithm (or what most historians mean by “compression”), and a new option to pre-summarize the data for larger windows- which helps when performing larger queries.

When I say “forever”, I’m referring to the times he had of multiple hours. This really shouldn’t happen, especially when querying the current partition. When people complain about history queries, it’s generally on the order of 1-3 minutes, which is made much worse by the fact that the client will time the operation out at 60 seconds. This timeout is changeable (from client properties in the designer’s resource tree), but we’re looking at other ways to deal with it as well.

Regards,

Would you like the log files, or do you think the next version will solve this problem?

The error log might be somewhat useful to see what error was actually happening, but I’m fairly confident that the problem leading to the connections not being closed has been fixed for 7.2.9.

Feel free to zip up the “wrapper.log.*” files in the install directory and email them to support, if you have a chance. Otherwise I don’t think it’s crucial.

Regards,