MSSQL Plugin Secondary Replica Database sizes

CMK version:2.3.0p17

I have deployed the new MSSQL Server plugin to my SQL Servers via Agent Bakery.
However, all database size checks on my secondary replicas go to Unknown because the secondary replicas are set to unreadable.
What could be done here? Since the secondary nodes can become primary at any time, I have to roll out the plugin to them as well.
Unfortunately, it is not possible for me to set all secondary replicas to Read Only.

Maybe you can use the labels at the host level to apply the bakery configuration accordingly? For the not needed sections those will be not queried from the secondary node depending on the label

It might also be an idea to set up a cluster object in Checkmk and make the services in question clustered. You can then set the aggregation mode to “best”, so only the best result will be used.

But if I set the labels, they would no longer fit as soon as a failover happens, because then the primary node would become the secondary node

On your recommendation I tried this, but the plugin keeps trying to retrieve data from the secondary replica, which causes a lot of errors on the SQL server and thus clogs the log.

The target database, ‘TestDB’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.

Yeah, there is no way around that out of the box.

Explicit labels won’t help at all.

Just a rough idea.
You can assign labels dynamically using some script via a local check on the agent side of each participating node of the cluster. The script can create a label depending on which one is primary and secondary. Since you have already created a MSSQL agent rule with these labels then depending on the value of the label the rule will be assigned to the appropriate node.

Clustering can help you to combine the services together but you can’t stop the agent plugin on the secondary node spamming the log files.

But if I do that then I have to bake the agent every time a failover happens.

I also found this


But i dont really get how i can edit the SQL Statments

Hi @Riegerj !

We have published a possible solution for this in the Knowledge Base.
Even though it is not an out-of-the-box thing, it should make it possible to manually resolve your question: https://checkmk.atlassian.net/wiki/x/AQD1Hg

Hope this helps!

@Sara
Thanks for your reply!
However, after reading the linked KB article, I don’t really see how that solution would apply to my situation.

The article you mentioned addresses backup detection on secondary replicas (by adjusting the backup.sql query to include non-primary replicas).
My issue, however, is with database size checks going to UNKNOWN on secondary replicas because those databases are set to unreadable.

So this isn’t related to backups at all.
Enabling “Readable Secondary” is unfortunately not possible in this environment, so I was hoping for a way to handle or suppress those UNKNOWNs for unreadable databases within the plugin itself.

Additionally, the current behavior is causing a large number of failed login entries in the SQL Server logs every time the Checkmk agent runs, which makes the logs quite noisy.

Hi @Riegerj ,

Thank you for clarifying.
We will look into this.

1 Like

Hi Riegerj,

In order to suppress the UNKNOWN’s , as those databases are not readable, can you simply exclude the databases from the rule itself for your secondary replicas and see if this resolves the UNKNOWN’s and the failed login entries?

Hi @chauhan_sudhir

The secondary replicas can change at any time, so it wouldn’t be practical to statically exclude databases in the rule. Checkmk would need to dynamically identify which replica is currently secondary and which databases are not readable, and then automatically exclude those from the check.

Could you please clarify how exactly these databases should be excluded within the rule configuration?

I was referring to

The secondary replicas can change at any time, so it wouldn’t be practical to statically exclude databases in the rule. Checkmk would need to dynamically identify which replica is currently secondary and which databases are not readable, and then automatically exclude those from the check.

Unfortunately, this is not easily possible with the existing SQL queries in the mk-sql and will require a detailed analysis.

This is also an issue for me, and I suspect many others. Licensing of MS SQL Server generally results in secondary replicas from being set as non-readable. There are use cases for having readable secondary replicas but they come at a financial cost.

My suggestion is that the ms-sql plugin needs to written to include the appropriate checks of the system tables or data management views (preferrable). These are able to inform as to whether a connection to a secondary database is possible or not. Handling of this common scenario should be handled gracefully by the plugin rather than causing multiple log entries to occur.

3 Likes

Please create and vote for such a feature at our ideas portal:

https://ideas.checkmk.com

I would have thought this was a case of making the product fit for purpose rather than being a new idea. To me it is a deficiency within the current plugin. MS SQL Server Always On Availability Groups have been a feature of MS SQL Server since SQL 2012, they are not new.

1 Like

Hi everyone,

We have the same issue. Since deploying the new MSSQL plugin, the logs on our secondary nodes are flooded with error-messages.

This was not a problem with the old plugin.

Our secondaries are not readable for the same reason as mentioned by the others in this topic: licensing. Changing this setting would effectively double our MSSQL license cost.

Thanks for looking into this!

Sincerely,

Pieter-Jan

3 Likes

Hello Pieter-Jan (and everyone else :slight_smile: ) !

For the product team to see and consider additional functionality, please, consider adding/ voting for it in the ideas portal – it is the main platform where one can directly convey their needs when it comes to features (that includes expanding the functionality of an existing feature).

We found an idea that seems to be similar to what could resolve your problem: https://ideas.checkmk.com/suggestions/675502/cluster-failover-behaviour-for-mssql-instance

Please take a look and vote, if you find this to be correct. Or create a new one.
To prioritise some functionality compared to another it is important for us to see that we are solving a problem that is important to the users, and the ideas portal is a unified place for that.

Thank you for understanding

1 Like