Recently we had a call for insights about a new Checkmk SaaS product.
That topic caused some feedback relating not to SaaS but to some of the concerns community has about development of Checkmk – specifically about prioritising fixing the bugs.
As it was, indeed, an off-topic discussion, I have decided it would be great if we had a topic specifically dedicated to your concerns. Not bugs, but the general ways we do things. And here it is: a topic where you can share your concerns, so we could all take a look at them and hopefully find a way to improve whatever we can improve together.
After the aforementioned discussion from my side:
I talked to the product and development teams. Some of the things became much clearer to me on why and how we do things. Together with the teams we realised that it would be great to talk about some of our processes more: for example, some of our developers offered to talk about how we fixed things at the next Checkmk Conference. Some of the concerns raised are already on the radar of the development team as something concerning too.
I would try to gather some information to share it with you, concerning bug fixes in the last year.
But the main thing I want to do is to create a space for constructive critisism, discussions and being more open with feedback.
So please, if you have concerns, suggestions, that you do not see as a separate topic in the feedback category and still want to share them or discuss them – here is a place for it.
My biggest issue with CMK/Tribe29 development/bugfixing is that there’s no public bucktracker.
About every major project/company in the OSS world - be it Redhat, Apache or whatsoever- has a public bugtracker, where one can report bugs, search for known bugs and see a bug’s current status.
This helps both users and developers, as users can quickly see if a (potential) bug is already known and if there’s already some fixing work in progress.
It also helps the devs, as the same bug is not reported over and over again trough different channels.
Also users can provide feedback and fixes/workarounds in a way that is much more practical than a plain forum.
I will start with this post just about features itself. – PART1 - (maybe will later post another or hopefully someone else already will address it before)
Let’s try with the new “feature” portal - and yes the quotes are on purpose.
Starting with some numbers
nearly 600 feature requests as of today.
6900 votes! (without the ones you deleted because co-workers voted from the same office - shows another known problem)
3!!! implemented = 0.5% (I think its 5+, but not my job to tell tribe29 what’s implemented - shows another problem)
2% implemented+done features (I credit you 5 implemented )
Wow! Great numbers for a start, what do you think?
There was already a big announcement on another conference some years before about the forum and that we, the community, your customers now have a “feature” section in the forum where we can get in touch with you and talk about features we are looking for which would make CMK become an even better tool. You heard us back in the day and we asked for something like it - But sadly, not much happened
Except 3 of your developers who occasionally looked into it and answered - silence.
Don’t remember the numbers (did some analyzing in the past) but much better percentage compared to the new “feature” portal, which supposed to be the better solution for us!
And because of this, let’s call it “not so great (bonding) experience”, many of us already knew after the announcement on the conference about the new “feature” portal where it was heading to - and we’ve been not disappointed!
Same business as usual, except tribe29 has a new platform and it feels for many, especially new community members and customers, they might be involved.
And of course, it’s much easier for you to maintain and direct people to, and not to forget the marketing part out of it.
But some other reasons/motivations for it been also already clear for many before the announcement on the conference, as “behind the curtains” it was set, moving away from the credit feature model as, original commented by one of the tribe29 staff - “then we don’t need to argue if and why we remove something in the future” - boom!
Not only happened that to us, we already know from some others of your customers, same game there.
Upgrading to a new version and a PAYED feature, suddenly is not there anymore. Implemented and played for the customer itself.
That’s going to be fun in the future! What do you think how those who heard this feel about the future?
But back to the “feature” portal. There was already a talk in the forum after many days, nothing happened, getting anything done or prioritized in the “feature” portal.
Shortly after we saw some been set to “planning”, but the funny part was, it was mainly not what the community and your main customers been asking for,
it was mainly what felt like, was already on the roadmap (good point we need to talk about that too!) from your perspective, finding some AWS and Azure related feature requests with just 2-5 votes (back in the days) suddenly marked as “planned”.
Funny part, still of today, some of them just have 6 votes - compared to those alone on the first 1-3 pages of your “feature portal”, lowest is 13 votes - Can imagine how we feel?
What do you think how that’s going to feel as a customer IF you actually try to work with the portal and put a lot of effort/work into it? Especially if you’ve been hearing the best on the conference - original quoting from one of the tribe29 stuff – “Just because it’s on the “feature” portal, doesn’t mean we going to implement it, even if there will be many votes” - Cool!
That sounds worth for all customers to put actually some effort into it what do you think?
600 features on 12 pages! - how many do you think will click on page two and how many will click on page seven to read and check some of the, really good and needed features?
Just jump over to google and read all public available data about user behavior google found out about - why is there no advertising anymore on page 2
So we have a “feature” portal, people complained in public within the first weeks about it (Zoom Call - RAW vs “Enterprise” and voting power), people complained about nothing happening, still nothing happening, portal get flooded with feature requests, still just 2.5% planned and 0.85% implemented, three out of the top ten features planned so far (one actually implemented but not flagged correctly)
What do you think will be the outcome out of the above?
And that’s just the “feature” portal, which is from customers perspectives the only way to give you feedback about what we, or the majority might be looking for or missing, or wishing to see in CMK.
tl;dr: I find it exhausting to report complex issues, testing should be community
I cut this out from my other post…
In the below example it turned out the customer had a mess of support contracts / subscriptions without support etc. It was not easily solved (time/benefit fell through).
(*) that exact thing happened - a few years back:
check workers will be terminated if they reach memory limit
AD replication check rises in memory usage (likely bug or log messages)
there’s no builtin debug interface for offloaded checks, nor are the dumps preserved
checks are not load-distributed among workers by static hashing or similar
there’s no mechanism to dedicate a worker temporarily to a troublesome check (there’s no concept of that either)
rolling self-termination of check workers randomly distributed by who gets assigned AD check
randomly (due to co-location of checks in worker being terminated) this also kills long-running SNMP checks of core switches
I was unable to report via the correct channels. Of course I also couldn’t report without those channels since most was under NDA. But don’t get stuck on the specifics of this…
I could also have annoyed some old contacts and handled this “auf dem kleinen Dienstweg” but honestly…
My brain time was already used up to debug / track the issues down; I at that point am delayed in what I was going to do. I had a workaround that was sufficient.
I then imagined how it would proceed and how I’d spend time having to prove there was a problem while there’s not even a way to simply report the bug so you’ll be aware and would actually see it.
(sans ticket since it would have been another 6 months or so to have contracts sorted.)
In the end I just dumped this and a lot of other similar findings into a word file and sent it to an internal contact at the customer. Unlikely that it ever made its way over.
The oddity about this is the feeling that sometimes you make it hard for others to help make your work easier; to stay ahead of the curve.
Roadmap / Testing
there’s also an overlap with the roadmap transparency question - I want to put my focus on functionality that is stable. Often it’s unclear if I’m hitting a bug because a section of the software is still raw or already rotting (since the successor is being worked on already). Clarity would be important.
For me personally the roadmap question also arises around the CI / testing topics. In the past I’ve often enough just built my own tests, but it is such a waste. (there’s not enough focus on enabling users to do even simplest stuff, like linting check man pages(*))
This, too is reinforced as a problem due to the lack of insight into the roadmap.
And it’s not like i could successfully open a PR with a gitlab-ci.yml, right?
(*) yes, I know it’s simple(**)
(**) no, that’s not a valid reason
When I try to integrate CheckMK and other things I need to be after solutions that share ways how they solve problems. This has always been problematic, be it lacking bulk updates of host objects in the API or when you create Ansible playbooks without reviewing the dozens that already exist, nor getting feedback from the existing community / authors for them. The archived ansible-checkmk was just like that. The ansible collection now with a contribution guideline and style guide is a great step further ahead, but it’s still not giving me the vibes that you’d expect the community to take the lead on the roadmap together with you.
It’s often kind of a ‘we know best approach’ - which IS TRUE in the CheckMK core, but for your interfaces a more considerate approach would be sometimes more productive.
This is in no way a comprehensive answer, but I am trying to do a quick summary for myself here, so I would need a little clarification.
I am not sure if I understood you correctly on what could be improved in the feature portal except the process of decision-making. But from the 12 pages argument I believe that there are also some interface problem with the feature portal too.
Could you please elaborate on what I understood to be a more interface/platform problem with the feature portal? Like what could be improved there technically? If I understood you incorrectly and it is only about the processes, it is also completely fine. I’m just trying to get a better picture.
This request is an example of what annoys me as a customer:
The comment of Thomas Lippert (Director Product Management) illustrates way, in my opinion, the product development is moving in to the wrong direction:
Just to be clear. This ticket is for filters. If boolean operations are required for rules, please create a separate feature request. Thanks!
Why is the customer feedback not appreciated and used to further improve this feature? Why are customers snubbed with the usual sentence “This does not belong to this feature request, please create a separate ticket for it” instead? Why can’t something like this be approached and implemented holistically? Why do we customers have to make for every little part of a separate request and face the uncertainty of when they will be considered.
And again months, if not years, pass before we customers get the full benefit of such a feature. And then it gets suddenly discontinued because it is allegedly not really used. But how are we supposed to really use it if it’s only implemented half-heartedly and the rest never has been implemented for the benefit of the customers?
Where is the far-sightedness and the understanding for the needs of the customers? Anyone who regularly works with Checkmk and has some imagination can think of where and how the labels could be of use. It is predictable that customers will want to use “AND, OR and NOT” everywhere where labels are used and not “only in the filters”, not “only in the search” and not “only in the rule conditions”. Since the labels have no predefined values, it is also obvious that it would be useful to be able to search/filter for specific values using regex. It is understandable that “relationship” and “regex” are two different requirements. But why the need to create separate requests for using the “relationships” in views, filters, rules or where ever labels are used.
When asked why Tribe29 didn’t implement it that way from the beginning, we quickly hear that at that time, it was not yet known that such a thing would be of use, they didn’t have the time, they didn’t have enough resources, it didn’t match the topics on the roadmap, they could never satisfy us, we always have something to complain about. We hear again and again why something could not be achieved. Rarely do we hear something like “good idea”, “we hadn’t thought of it that way”, “this is how we plan to provide the full potential of this function”.
Of course, you must develop the product further and tackle topics such as cloud, SaaS, etc… But please do not neglect the functionalities of the core product. They are what makes the product so great, unique and useful for us customers, and they form the basis for all other product variants.
I agree to all the posts above and we are as well disappointed about:
Features portal and the state of implementations, the votes are not seen, features are not only vote-able by the highest vote rate, they are customers needs
That we as customer has to pay for bug fixing
That we’ve to pay credits for an automation ticket closing. 1 credit per auto close
That tickets are closed without customer request, without answering the open ticket questions, etc.
We’ve a lot of discussions with the PM but we don’t see any solution that the customers needs are seen and implemented.
For example the PM told us to use labels and not hosttags, so we implement all labels but now we can#t use it because we can’t search for them. it is not possible like it is with the hosttags. so why is the usage different and why are there no consistent implementations of such parts.
Another example, the database implementations are totally different, no consistency at all (names, rules, bakery, etc).
I hope that the posts are seen Checkmk responsible they can help us customers.
We customers pay the software and the development.
Please help us and let us develop together a useful and consistent tool !
I hope that the posts, feedback’s, feature requests and information’s from us customers are seen as constructive critic to enhance Checkmk and makes it better.
The customer needs are the features of the product. Hence the product benefits and grows in real usage which is better to sell and makes everybody happy.
Hello, and sorry for the delay in answer – it has been a rough couple of days.
For now, I want to get the whole picture of what members of the community are concerned about, see where we could adjust and generally have some estimation and understanding.
I am worried that if we disperse it at this stage into many topics nothing will get systematically addressed in the end. I would suggest going incrementally – first get the general idea, then address specific things that we could improve.
I hope it’s ok with you.
Sure its ok - BUT its completely the opposite of what we are used too. (create for every single “parameter” you want to have changed, another support ticket, as its another task)
And as you can see from the UX Design Feedback Thread, “some” quickly getting lost on tracking all points and adressing them. If its not the case with you - super
I completely agree on all things decoupling and dividing into smaller tasks – always works best and gives great results.
The thing with this thread though is that it is for all the things that could not be said in other places, something more general I’ve created it because I saw that it might’ve been needed – I believe that sometimes it is easier to speak out in a more open question kind of way, to cover things that could not be covered through more “induction” way, maybe going through a more generalized view could work.
I will definitely divide it into smaller pieces when I summarize this thread’s replies a little and analyze it and also discuss it with other teams separately. The summary will be here, and you will have the chance to agree or disagree with me Not that I have all the answers, but I am ready to try this, and I am actually very grateful – some suggestions here seem really constructive, and we could build on them.
Just to put this into perspective: The auto-close message in the screenshot is sent three weeks after the last activity of the customer. And even then they have two more weeks to get back to the ticket to reopen it. So effectively a ticket is closed irrevocably after five weeks without any interaction whatsoever from the customer.
i totally agree with Daniel. The Portal is in an enterprise perspective a “nightmare” and NOT Enterprise Ready. It shows some good und usefull ideas. But not more.
There are still missing many elements how to handle requests for community and enterprises. May be a kind of crowfunding for a special feature with support could we available in the future etc.
first I want to say thank you for giving us a place where we can send you feedback. I try to sort things a bit, because this posting will be a bit longer.
Release Engineering / QA
We started with 2.1, so I can’t say anything to versions before. Our experience with tribe29 is not so good. We started with 2.1b4 and I fully understand that beta releases are beta and I have no complaints here. But it didn’t get really better with the production releases. There have been things that just didn’t work after an update like Werk #13932 or #14868 for example, that would have been discovered with proper testing before. I understand that you can’t test any cases that customers have in their setups but I’d expect at least regression testing.
Feature Portal / PR Handling
Regarding the Portal I fully agree with the above posting from foobar. I wanted to add another thing: We learned that it is very unlikely that desired features will be developed (soon). So we did development on our own and submitted pull requests. They are lying around now and nothing really happens. This is quite annoying because a) we put effort in it and tribe29 and their other customers would benefit from it and b) as long as tribe29 does not implement, we would have to manually patch checkmk after every release.
Also I want to mention that IMHO you should distinguish between paying customers and open source customers. And maybe also on customer size. At the moment even non-checkmk users could vote for features. I could imagine, that bigger customers see an unfair treatment here.
One of our major reasons to go with checkmk was builtin SLA reporting. But when using it there are many concerns. Starting that no period >month and ending that the PDF reports that we want to send to our customers look really awful. And even if we’d want to develop our own pdf generator: We can not, because there’s no SLA in REST API. Also worth mentioning here is that setting downtimes subsequently is way too complicated.
Another thing is that SLA calculation is quite questionable. There is the option to include past month in the SLA view and PDF reports. If you do that and SLA is broken for one month, the summary assumes that SLA of this month has been 0%. So if SLA for jan-nov is 100% each month and dec 98% and thus broken, summary says ~91% for this year instead of 99.8% which would be the correct value. I’m pretty sure that tribe29 does not treat this as a bug but as a feature. See next chapter.
Our service level has no “Bugfixing” included. Your terms of conditions say bugfixes are free. We learned that that we have to use feedback@ when we want to have bugfixes for free. From 10 feedback@ mails we got one response. This is not acceptable. In your new service level contracts, bug fixing via ticket system is included, but we are not allowed to change our model to the new one. Why not? And we also learned that bugs are often classified as “feature”. So even with a “bugfix” contract, we would be forced to pay for them because tribe29 classifies as feature.
One example is here: Setup: show error message if rule changes don't get saved - Checkmk Tribe29 says that not saving AND not throwing an error message is not a bug. If checkmk does not save here, at least a notification to the user would be expected. And I guess most of you customers would agree that this is a bug and not an additional feature.
Cloud Edition (Formerly known as Plus Edition)
We signed our contract shortly before the checkmk conference. No one mentioned the plus edition. But cloud enhancements were promised. And looking at the editions on the website, it suggests that you get everything with enterprise. So I’m sure you understand our disappointment when we heard about the plus edition in the conference. And there is literally zero information yet, how it is licensed. All we can see are commits every few weeks that move features away to cpe.
To me it seems that tribe29 would like to treat checkmk similar to an open source project. They define what feature customers need, what not etc. If there are complaints from customers, tribe29 appeases and passed the problem to the customer. “You’re doing it wrong”, “This is not what the feature was designed for” or “Why do you use this feature, better use feature xyz.” But IMHO, tribe29 needs to understand that real world scenarios are different to their testing bubble. I guess most checkmk users know what they do and most times there is a reason why they want to use “foo” or complain about “bar”. And if tribe29 would really see checkmk as an open source like project, then a public bug tracker (Github issues) etc. would be the least they could/should do. There are many projects that do this this way. Ntopng in one example here. I wish tribe29 would behave more customer friendly in the future. We pay for your product, we are a customer and not just a random user of a free software.
I want to add here that this is the personal view of me and my collegues based on what we read at the forums here and within the tickets that we’ve opened.
Thank you for opening this thread and giving us the chance to address our concerns.
I suggest to open a separate thread for each topic to provide details and discuss if needed.
We recently upgraded our Central Monitoring from 1.6 to 2.0 and it took us several weeks to develop a procedure which allows a reasonably hassle free migration. Finally we had “only” a few hundred false positive alerts.
Our Central Monitoring is not an exotic setup and we mainly using more or less basic functions from CheckMK. Even that we started to test migration with a late version (2.0.0p27) we had to open a bunch of tickets to let tribe29 fix bugs and develop special code to be able to migrate our configuration.
We pay annual subscription, maintenance and break fix. These are the figures recognized by Tribe29 but we have the impression that it is not recognized that in addition to that it consumes al lot of resources on the customer side to help to fix the neglectfulness in the code. As a consumer we have to calculate this in the economical calculation of the software.
Here are my main concerns for today:
Quality of the code is not at the expected level
Checks/Configurations changing without a proper way to migrate the check/configuration from current
version to next version.
Check logic is changing without notification to the customer.
Changes are implemented without consulting the stakeholders/customers.
Customer view/need is not appreciated.
WERKS are not exhaustive or not understandable.
I now have to start to develop the process to migrate our Local Monitoring which is a highly distributed installation with customizations. I expect another view month of hard work to achieve this. SIGH
Regards the new feature portal I suggest to open another thread to discuss this. We are also not pleased with it.
I also agree with some of the concerns in this post. I know several perspectives: early enterprise customer in checkmk startup phase, consultant at different customers, …
The key points from my perspective are:
Public bug tracker
This would invalidate several points at once
Current state of an issue
Significantly less bug tickets for tribe, as already open issues are viewable
Active communication between different customers / users and Tribe29
Only counterpart: effort for moderation
Feature Voting Portal
Since the strong growth of tribe29, I can imagine that the “agile hands-on practice” from the startup phase is no longer practical. However, enterprise customers should be able to position features with a different weighting than someone (my mother, my son, …) who can open a browser and was told to please vote. Why then a feature with few votes is set to “planned” and other top voted features are still “under consideration” is also not transparent to the community. I can imagine that feasibility, effort and cost play a big role, but the decisions are not transparent, since there is not really a discussion / interaction between community / tribe29 about it.
As sadly we gathered the experiance over that last three major upgrades, even with a lot of testing (why we need to do this actually?!) there will be always, big bugs we will encounter.
Due to that fact, we always wait for a certain amount of patch releases to be made, before we upgrade to avoid the majority of bugs (think about this itself )
2.0 → 2.1 release we did the same, and due to some “luck” we’ve been forced to wait even a little longer, till release p16. Even if some desperately needed features been implementeed.
And sadly again, we run into some, for us critical bugs and opened up again a lot of bug tickets in the weeks after the upgrade and again invested a lot of OUR time to make the product better
p16 = 416 bugs/security fixes
p19 = 534 bugs/security fixes
29% more bugs/security fixes in 3 releases and not talking about 1-2 month after the major release
(all data filtered from your werks site)
Huge numbers and if for some of our customers questionable, esspecially concerning for Enterprise Companys
Also to mention, of course our setup is hugely compley with using every major part of checkmk extensivly.(well known by tribe29) So impossible to test everything, but we also offered our help years ago to use this case for you to work on QA.
We had a couple of days ago a talk with @martin.hirschvogel and IF what we talked about really gonna be in 2.2 in addition to the SaaS Version you are releasing, where you will be forced to actually work with CMK, we believe, NOW, there are good chances, the QA talk will drasticly improve with the 2.3 version latest.
But never mind, what many customers feel is, they are actually doing the QA/testing for you and without being paid for it! (we all take credit vouchers )
Even if Tribe29 will send a credit note to my company it will not restore the many hours we spent for troubleshooting. These are all unplanned hours we don’t have available anymore for our internal projects.