Hi @jan.justus,
Thank you for taking the time to reach out and for pledging to improve upon the issues that have been raised.
In terms of code contributions, we have so far been a rather company-led project with smaller PRs. What is different in your projects is that you are planning bigger architectural changes of the agents. This is great and harmonizing the agents obviously has both customer and maintenance benefits. But it also meant that we need to get involved more to review larger PRs and think through the changes. This caused the process to get stuck on our side, since the team is currently focused on other priorities.
It seems to me that there’s been too much focus on my larger PR’s and using them as an excuse for this experience. Yes, I have contributed some big ideas and a few large PR’s, but I’ve actually contributed more smaller PR’s, and I’ve had mixed results with them.
If we look at, say, #227 (mine), we can see that a somewhat straightforward PR sat there for approx 152 days. It would have sat there longer if I didn’t prod.
#255 (not mine) was a slam-dunk PR IMHO. 297 days. It would have sat there longer if I didn’t prod.
#166 (mine) was a simple incremental improvement that just needed a little polish from Sven. 251 days. It would have sat there longer if I didn’t prod.
At present, the oldest open PR, #52 (not mine) suggests adding a single line to a file. This is at 757 days and counting.
When I closed #28, it was 782 days old. #52, a single line change, may soon be older than #28, a 3.1k line change. How stuck on other priorities are you guys that you can’t make a call on whether a single line is added to a file or not? I would like to repeat my earlier statement: this isn’t a good look for tribe29.
Looking at the queue now, it looks to me like 44 out of 50 open PR’s are a year old or more. When each of these were opened, they may or may not have proposed reasonable ideas and/or code. But because they’ve been ignored and left to stagnate for so long, most of them have likely either been surpassed, diverged significantly away from, or otherwise made irreconcilable.
And this has been the case with the PR’s that I closed. For example, #167, which sat unattended before different code that covered the same goal was committed separately. 461 days after that PR was first opened with the net result being duplicated effort.
And that’s free effort that tribe29 is passively choosing to throw away.
It should be abundantly clear by now that PR’s are falling through the cracks, irrespective of size or complexity.
Solution discussion:
Far be it from me to tell you how to do your own jobs. I don’t have any idea what your internal culture is like or what your current processes are, so with that lack of context in mind, I would suggest something like:
- In the immediate term, maybe have a PR “spring clean” where the tribe29 team crunches through as many outstanding PR’s as possible.
- Going forward, set a maximum age that a PR can possibly be. Let’s say, 160 days to start.
- Put monitoring on the PR queue. I’m sure you can find a monitoring system somewhere
- Set a Warning alert on aged PR’s with the threshold at something like 120 days. The idea is to bring ageing PR’s back to the forefront for whoever is looking after the PR queue, and the increased attention should ideally move the PR towards either merging or closing.
- Set a Critical alert on aged PR’s, with the threshold at something like 140 days. This should invoke immediate and prioritised intervention. You’ve got 20 calendar days to figure it out and get the PR done, one way or another.
- These hypothetical thresholds as described are simplistic and indexed from the point that a PR is opened, eventually you may want to change that to be indexed based on the last update within the PR… Or just take that approach from the start. Or take both metrics into account. Your call.
- As this process gets properly bedded in, the thresholds can come down.
- Or, if dogfooding doesn’t sound appealing, maybe something like actions/stale may be useful.
No matter what solution you end up with, the best time to engage with a PR is while it’s still fresh in its author’s mind.
CONTRIBUTING.md states:
If you would like to make a major change to Checkmk, please create a new topic under the Product Ideas category in the Checkmk Forum so we can talk about what you want to do. Somebody else may already be working on it, or there are certain topics you should know before implementing the change.
We love to work with community contributors and want to make sure contributions and time investments are as effective as possible. That’s why it is important to us to discuss major changes you might be planning in order to jointly agree on the best solution approach to the problem at hand.
So by posting this thread here (605 days ago), I was following the official guidance. Yet, much like Github, I’m not seeing much engagement from tribe29 here. This forum is not exactly flooded with posts reading “Great idea! We’ll put it on the roadmap!” or "That’s a good idea, but we’re going in a different direction because of xyz… " or “Interesting idea, have you considered abc…” etc
Solution discussion:
I mean, this one is on you guys. Just like Github, you have a community here offering up ideas, code and assistance freely. The smart thing to do is to invest a bit of time figuring out how to leverage the community so that it’s picking up some of your workload.
For example, you guys obviously have an internal issue tracking system. Within that will be a bunch of issues. A sub-set of those will likely be commercially sensitive, but the rest will just be generic. Why not spend some time developing a way to mirror those issues into the Github issue tracker and see what the community contributes? Maybe send out free tribe29 merch to authors of exceptionally good or useful commits, and/or have an unofficial “community commit of the month”. I’m sure @fayepal would have some other great engagement ideas to wedge in - open up the opportunity and let her use her talents. Let the community pick up some of the load, and in doing so, free up more time for yourselves.
Now obviously that’s a rose-tinted, optimistic ideal. But it’s a goal at least; better than nothing, better than the status quo, and at least something that can be worked towards.
And then there’s this. So on the one hand I’m being effectively told that I should commit incrementally (something I demonstrably already have done), and on the other hand I’m being effectively told to not bother committing at all… and precisely at the point that I was about to open an incremental PR…
Solution discussion:
This one is easy. Change the definition of a bug to include “anything that annoys Rawiri”
But seriously, this should be solved by straightening out your PR handling processes and offloading some workload to the community, as described above.
And on top of all of that… One of the biggest issues when developing and submitting a PR or suggesting a major architectural change is that this is often done with virtually no visibility or context of checkmk’s development direction. You guys keep that locked away pretty tight, and it’s to your detriment.
Solution discussion:
Again, this is on you guys. It would be great to have access to something like a technical roadmap that lays out the forthcoming project goals. Product Ideas can be accepted from this forum and elsewhere into the roadmap - meaning that it’s a constantly evolving “living document”.
From my experience, the only ideas I have received about coding direction has been from passing comments in Github. If I have access to a reference document that defines what direction the codebase is pointed in, I can maybe contribute towards the roadmap’s goals.
Without that kind of knowledge-share between tribe29 and its community, we’re all just thrashing about in the dark. For example, what is the intent behind cmk-agent-ctl?
The *nix agent code has also diverged off into a direction that I probably wouldn’t have taken it, and, without wanting to disrespect any of the recent contributors at all or their work, it looks to me like it’s being coded into a bit of a corner. This means that any architectural change is going to be increasingly difficult and painful, and it’s what I was hoping to avoid with the primary suggestion that I made with this thread.
Solution discussion:
Well, to be clear: I am obviously not the be-all, *nix agent Super-Jesus, and I certainly don’t think or expect the agent code to be completely my way. The best I can do is contribute my ideas. For my part, I have to decide whether I want to reengage, you guys have to decide whether you’re going to
a. accept contributions from me and
b. open up and let the community know what direction you’re headed in and
c. up your PR processing game. If I reengage, I don’t want simple PR’s sitting there for hundreds of days. That blocks me from making further contributions and is a major disincentive. The same may or may not be true for other contributors.
If I do reengage, and you guys do manage to process your PR’s in a timely manner, then we can thrash through a bunch of incremental PR’s very quickly.
Finally, the agent scripts serve an important role within checkmk’s functionality. It has seemed to me that there has been limited interest from tribe29 in getting the agent’s fundamentals stabilised and then building the rest of the product from there. The agent code appears to have become a second class citizen to more important things, like dark mode themes.
Solution discussion:
Well, this is totally on you guys to decide what you want to prioritise. FWIW I think that you’re 3-4 years behind on where the *nix agents should be.
What I would propose – if you are still interested to reengage – is to actually collaborate more closely initially to make the collaboration more effective:
Start this effort of with a virtual planning session (video or telco) with you and members of our development team.
- Align on the current state, changes in our master branch etc.
- Jointly sketch out how to evolve the agents architecturally and pitfalls to watch out for based on our development and support experience across different customer types
- Align on how to make the PR process productive for both sides (how to split them etc)
- Help us understand what is important for you as a major contributor
Once this step (probably a bit unusual for an open source project) is done, it is much easier to shift to asynchronous mode again.
I am still undecided about whether or not I want to reengage.
In the meantime, I will have to politely decline the offer for a planning session. Firstly: You guys are in Germany (I hope this isn’t news to you ), and I’m in New Zealand. Our timezones just don’t map nicely in a way that I can factor in around my day-job and family time. Also, I suspect that this lengthy post might be taken a bit on the nose and you guys might not want to talk to me for a while, if ever.
Secondly, this isn’t (and shouldn’t be) about me: I don’t expect special treatment. In my view, and as I stated above, the best outcome here is for tribe29 to leverage its community in a way that takes some of the workload off tribe29. Instead of trying to bring me closer, try being closer to your community.
Cheers
Rawiri