Custom plugin issue!

CMK Version: Check_MK 1.5.0p21
OS Version: Windows Server 2016+

Issue Description: MSSQL monitoring custom BAT plugin issue.

Scenario:
Due to hardening policies, we had limited options for implementing standard plugins, which are typically written in PowerShell, Python, or other executable scripts. To address this, we developed an SQL query plugin, which is called via a BAT plugin. Essentially, the SQL query output is formatted for Check_MK, and the BAT plugin executes the SQL run.

Issue:
We have multiple BAT files and SQL files for each service check (plugin). If any SQL query fails due to a typo, all service checks are removed. Although the “check_mk_agent.exe test” output shows the service status, all services disappear from the UI if one SQL plugin is faulty.

From my understanding, one plugin’s output should not affect the results of other plugins. However, in this case, it does.

I am attaching reference BAT scripts and SQL queries (only 1). To replicate you can copy same twice and check. If one SQL plugin is deployed incorrectly, both checks are removed from the UI, but they are visible in the “check_mk_agent.exe” command output.

I need to find solution of this behavior
Yes, I know I am using old check-mk plugin, but we need to find work around with this plugin itself.

**Snip: **
When running the “check_mk_agent.exe test” command, the output shows an error if there is a typo in the SQL query. This error causes all plugins to disappear from the UI.

and same one error plugin script, i see in UI below error.


BAT SCRIPT

@echo off
set SQL_QUERY_FILE="<PATH>active-session.sql"

REM Run the sqlcmd command and capture the output
for /f "delims=" %%i in ('sqlcmd -S localhost -U "user" -P "passwrd" -i %SQL_QUERY_FILE%') do (
    echo %%i
)

SQL QUERY SCRIPT

SET NOCOUNT ON;
DECLARE @warningThreshold INT = 10
DECLARE @criticalThreshold INT = 20

-- SQL query to count active user sessions
DECLARE @query NVARCHAR(MAX) = 
'SELECT COUNT(*) AS ActiveUserSessions
FROM sys.dm_exec_sessions
WHERE is_user_process = 1
AND status <> ''sleeping'''

-- Execute the query and retrieve the result set
DECLARE @activeUserSessions TABLE (
    ActiveUserSessions INT
);

INSERT INTO @activeUserSessions
EXEC sp_executesql @query;

DECLARE @status INT = 0  -- OK
DECLARE @statusText NVARCHAR(50) = 'OK'

-- Determine status based on thresholds
DECLARE @currentActiveUserSessions INT

SELECT TOP 1 @currentActiveUserSessions = ActiveUserSessions
FROM @activeUserSessions
ORDER BY ActiveUserSessions DESC

IF @currentActiveUserSessions >= @criticalThreshold
BEGIN
    SET @status = 2  -- Critical
    SET @statusText = 'CRITICAL'
END
ELSE IF @currentActiveUserSessions >= @warningThreshold
BEGIN
    SET @status = 1  -- Warning
    SET @statusText = 'WARNING'
END

-- Format output for Checkmk local check format
DECLARE @output NVARCHAR(MAX) = 
CASE @status
    WHEN 1 THEN '1 MSSQL_Active_Sessions - ' + @statusText + ': ' + CAST(@currentActiveUserSessions AS NVARCHAR(10)) + ' active sessions'
    WHEN 2 THEN '2 MSSQL_Active_Sessions - ' + @statusText + ': ' + CAST(@currentActiveUserSessions AS NVARCHAR(10)) + ' active sessions'
    ELSE '0 MSSQL_Active_Sessions - ' + @statusText + ': ' + CAST(@currentActiveUserSessions AS NVARCHAR(10)) + ' active sessions'
END

-- Output result as a single line
PRINT @output

I would argue that it is more detrimental to the security of your systems when you try to implement something by yourself instead of using the tested implementation of the Checkmk team. Which is open source and can be audited when necessary.

Easy solution → include some error handling in your script that calls the SQL statements.
Only if the SQL statements give a proper return code then output something.

I did applied debug and that helped me to fix sql related error.
but this is something weird behavior with check-mk.

Senario:
there are 4 bat plugin which call 4 sql query.
Now output of 4 check work fine, and now 1 check is failed for output print (due to human error)
and when i run check_mk_agent.exe command, it print output for 3 checks in (cli) correctly with one having error. but due to one bad check all working 3 checks getting into error. (which has nothing to do with error)

The problem is your one bad check crashes the complete local section of the agent output.
If you work with local checks you have to ensure that the output is without errors.

If your failed check would be the last in the local section then all the checks at the start are shown correctly.

3 Likes

Fair point!!!
got some Glims and will modify checks accordingly :slight_smile: