Issue with the Nutanix Prism special agent

I have an issue with the special agent for Nutanix Prism (see below).

The Nutanix cluster is a brand new setup with only one non-system VM.
I managed to get the agent to probe one of the CVM’s successfully. But I figured that is not intended implementation.

I have the special agent running successfully against a the Prism of a demo appliance deployed on VMware. So it the issue might be related to the version or configuration of the Nutanix cluster I guess.

Can anyone point me in a direction to get the agent working against the PrismCentral server?

Information of the setup that fails:

Nutanix versions:
Version pc.2024.1.0.2
NCC Version: 5.0.1
LCM Version: 3.0.1.1

CMK version: Cloud Edition 2.3.0p15
OS version: virt1-1.7.2

Error message: [special_prism] Agent exited with code 1: ERROR 2024-09-25 09:11:38 agent_prism: HTTP error: 412 Client Error: PRECONDITION FAILED for url: https://[HOST_IP]:9440/PrismGateway/services/rest/v2.0/protection_domains(!!)

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

Checkmk version 2.3.0p15
+ FETCHING DATA
  Source: SourceInfo(hostname='PrismCentral', ipaddress='[HOST_IP]', ident='special_prism', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f2b8fc84560]
Read from cache: AgentFileCache(PrismCentral, path_template=/omd/sites/[SITE_ID]/tmp/check_mk/data_source_cache/special_prism/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (does not exist)
Calling: /omd/sites/[SITE_ID]/share/check_mk/agents/special/agent_prism --pwstore=6@0@/omd/sites/[SITE_ID]/var/check_mk/passwords_merged@uuidef242cbb-b88f-4144-a90f-65d369e59ec9 --server [HOST_IP] --username admin --password '[PASSWROD]' --no-cert-check
Get data from program
[cpu_tracking] Stop [7f2b8fc84560 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.3, children_system=0.04, elapsed=0.8799999989569187))]
  Source: SourceInfo(hostname='PrismCentral', ipaddress='[HOST_IP]', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f2b8fde3a10]
Read from cache: NoCache(PrismCentral, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
Piggyback file '/omd/sites/[SITE_ID]/tmp/check_mk/piggyback/PrismCentral/[NUTANIX_CVM]': Successfully processed from source '[NUTANIX_CVM]'
No piggyback files for '[HOST_IP]'. Skip processing.
Get piggybacked data
[cpu_tracking] Stop [7f2b8fde3a10 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7f2b90731400]
+ PARSE FETCHER RESULTS
<<<prism_vm:cached(1727248247,90):sep(0)>>> / Transition NOOPParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
  HostKey(hostname='PrismCentral', source_type=<SourceType.HOST: 1>)  -> Add sections: ['labels', 'prism_vm']
Received no piggyback data
Piggyback file '/omd/sites/[SITE_ID]/tmp/check_mk/piggyback/PrismCentral/[NUTANIX_CVM]': Successfully processed from source '[NUTANIX_CVM]'
No piggyback files for '[HOST_IP]'. Skip processing.
[cpu_tracking] Stop [7f2b90731400 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[special_prism] Agent exited with code 1: ERROR 2024-09-25 09:11:38 agent_prism: HTTP error: 412 Client Error: PRECONDITION FAILED for url: https://[HOST_IP]:9440/PrismGateway/services/rest/v2.0/protection_domains(!!), [piggyback] Successfully processed from source '[NUTANIX_CVM]', execution time 0.9 sec | execution_time=0.880 user_time=0.000 system_time=0.000 children_user_time=0.300 children_system_time=0.040 cmk_time_ds=0.540 cmk_time_agent=0.000

Where does this [HOST_IP] come from? It’s neither a valid variable nor a valid hostname. Make sure that you have configured a valid ip address in your host configuration or leave the field empty (dns lookup will be performed).

I see your point. The reason is that I’ve obfuscated the original IP’s and site names to prevent data leakage. I should have mentioned it in my initial post.

After some digging around, i changed the config to point at the Prism Element VIP instead of Prism Central, and everything works.

Lesson learned :slight_smile:

Hi Everybody,

Here, the same problem.

[special-prisma] Agente salió con código 1: ERROR 2024-10-16 14:10:38 agent-prisma: HTTP error: 412 Cliente Error: PRECONDITION FAILED for url: https://ip:9440/Prismateway/services/rest/v2.0/protection-domains (.)

But, with this change we only have the alerts from this cluster. We have 2 clusters y 1 Prism Central.
If i change the ip, we lost the alerts from Prism.

Is there a solution for this issue?

Regards

Version CMK 2.3.0p12
Ubuntu 22.04

The problem is that at building time of the agent i had no Prism available with multiple clusters defined.
To solve such problems it needs direct access to the Prism VM.

To test if this is only a problem with “protection-domains” you can edit the “agent_prism.py”. Starting with line 155 you find this code.

    prism_objects: GatewayData = {
        "containers": session_manager.get(f"{base_url_v1}/containers"),
        "alerts": session_manager.get(
            f"{base_url_v2}/alerts",
            params={"resolved": "false", "acknowledged": "false"},
        ),
        "cluster": session_manager.get(f"{base_url_v2}/cluster"),
        "storage_pools": session_manager.get(f"{base_url_v1}/storage_pools"),
        "vms": session_manager.get(f"{base_url_v1}/vms"),
        "hosts": hosts_obj,
        "protection_domains": session_manager.get(f"{base_url_v2}/protection_domains"),
        "remote_support": session_manager.get(f"{base_url_v2}/cluster/remote_support"),
        "ha": session_manager.get(f"{base_url_v2}/ha"),
        "hosts_networks": hosts_networks,
    }

Comment the line starting with “protection_domains”.
Also comment line 222 with

    output_entities(gateway_objs["protection_domains"], "protection_domains")

But i see one problem here - your error message shows.

But the real URL should be.
https://ip:9440/Prismateway/services/rest/v2.0/protection_domains (.)
No - but _ in the last part of URL

Thanks Andreas,

We upgrade again.
2.2.p17 to 2.3.p012
Info: in 2.2 we have installed mkp nutanix_prism 5.0.7

When we do vi lib/check_mk/spacial_agents/agent_prism.py

We have to comment protection_domains, remote support, ha and hosts networks.

After that we have this in our CMK

The info about the crash

Exception

ValueError (‘\n’ not allowed in ‘summary’)

Traceback

File “/omd/sites//lib/python3/cmk/base/checkers.py”, line 716, in get_aggregated_result
check_result = check_function(**item_kw, **params_kw, **section_kws)
File “/omd/sites//lib/python3/cmk/base/checkers.py”, line 496, in __check_function
return _aggregate_results(consume_check_results(check_function(*args, **kw)))
File “/omd/sites//lib/python3/cmk/base/checkers.py”, line 554, in consume_check_results
for subr in subresults:
File “/omd/sites//lib/python3/cmk/base/api/agent_based/register/check_plugins.py”, line 91, in filtered_generator
for element in generator(*args, **kwargs):
File “/omd/sites//lib/python3/cmk/base/plugins/agent_based/prism_alerts.py”, line 87, in check_prism_alerts
yield Result(
File “/omd/sites//lib/python3.12/site-packages/cmk/agent_based/v1/_checking_classes.py”, line 403, in new
state, summary, details = _create_result_fields(**kwargs) # type: ignore[misc]
File “/omd/sites//lib/python3.12/site-packages/cmk/agent_based/v1/_checking_classes.py”, line 447, in _create_result_fields
raise ValueError(“‘\n’ not allowed in ‘summary’”)

Local Variables

{‘details’: None,
‘name’: ‘details’,
‘notice’: None,
‘state’: <State.OK: 0>,
‘summary’: “Last worst on 2024-10-10 03:00:23: ‘Detailed license expiry info: "
‘[info]\n’
"[info]’”,
‘var’: None}

We try it whithout the mkp installed but, the same problem, only whith the prism, the clusters are ok. Only the crash for the alerts…

Thanks and Regards.

With 2.3 installed you don’t need any mkp for the Nutanix checks anymore.
I would recommend the following approach.

  • create a host for you PrismCentral
  • configure the special agent for this host without any modifications
  • go to the command line
  • cmk -D PrismCentral
  • use the command line for the special agent with the addition of “–debug”
  • the output you can sent me as PM if there are too many sensitive data inside that cannot be posted here

Please make these tests inside a clean empty site, then also the tests will not impact the production system.

Hi all,

as another customer is facing this issue, I have created a draft for a possible solution:

It basically skips the concerning endpoints in case a prism central is queried.

Do you have any thoughts around that?

The change looks ok. I cannot test if live at the moment as there is no “free” time before the end of year :frowning:
With actual AOS version the “ha” sections makes also problems if you query a Prism Element and not the Central. But also here i need a little bit time to inspect the system. At the moment i only commented the “ha” section and it was working again with this AOS version.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.