Hi @hroberts65616,
Since you don’t know which process is causing the slowness yet, here’s a structured approach:
1. Make all processes visible in Checkmk
You can discover all running processes at once. Go to Setup > Services > Discovery rules > Process discovery and create a rule for the affected hosts:
- Process Name:
.* (matches everything)
- Process User: leave empty (all users)
- Service Description:
Process %u %c (shows user and command)
- Thresholds: don’t set any for now — just discover
Then run a service discovery on the affected machines. This will create a lot of services, but that’s fine for the diagnosis phase. You’ll be able to see virtual memory, resident memory, and CPU usage percentage per process in the service details and graphs.
However, keep in mind: this shows you which process uses resources, but not necessarily why the application feels slow. CPU and RAM per process might all look normal while the real bottleneck is somewhere else.
2. Look beyond CPU/RAM — common causes for “app is slow but system looks fine”
Since your overall CPU, memory and disk metrics are not showing issues, the problem is likely not raw resource exhaustion. Common hidden causes:
- Network latency — Is this a client-server application? If only one department is affected, the network path between that department and the app server could be the bottleneck (bad switch port, congested VLAN, different subnet routing)
- DNS resolution — Slow or failing DNS lookups can make applications feel extremely sluggish while not showing up in any system metrics
- Disk I/O latency — Overall disk throughput can look fine, but per-operation latency might be high, especially on spinning disks or overloaded SANs. Check the “Disk IO” service in Checkmk — look at the latency metric, not just throughput
- Network shares — Does the application access files on network drives? SMB latency issues are invisible in local system metrics
- Antivirus — On-access scanning can massively slow down applications that do many file operations, especially with older AV engines
3. Use Windows diagnostic tools to find the actual root cause
For initial troubleshooting, these Windows tools are more suited than Checkmk because they show real-time, per-process I/O and wait states:
- Process Monitor (ProcMon) from Sysinternals (download) — Shows every file, registry and network access per process in real time. Filter by your application’s process name and you’ll immediately see what it’s waiting on (slow file reads, network timeouts, registry queries)
- Resource Monitor (built-in, run
resmon.exe) — Shows CPU, Disk, Network and Memory broken down by process, including disk response times and network latency per connection
- Windows Performance Recorder (built-in, run
wpr) — For deeper trace analysis if the above tools don’t reveal the issue
My recommendation: Start with Resource Monitor on an affected PC while the slowness is happening. Look at the Disk and Network tabs filtered by the application’s process. That will likely point you to the root cause.
Once you’ve identified the bottleneck, you can then set up targeted Checkmk monitoring for ongoing alerting — for example, process-specific checks, network interface monitoring on the switches, or active HTTP/TCP checks against the app server.
Ressources:
Hopefully, I’ll push you in the right direction.
and for further investigations we need some more information of how the app is working
Greetz Bernd