As our North American colleagues (and those who have made the big trip over there) head into the governance sprint now seems like a good time to recap on product maintenance in CiviCRM. Product maintenance, as I discuss, is the monthly routine processes we do to incorporate patches & contributions into the CiviCRM product. This blog is kinda long & weedsy - so if it’s not for you then take a look at this baby octopus instead.
Since 2016 we have been running an approach to managing CiviCRM called LeXIM. I can’t explain LeXIM better than Tim did so I have plagiarised the following from his blog
If a change represents a major leap forward in functionality, it should be developed as an extension. Leaping in this way also encourages modularization.
If a change represents an incremental fix or improvement, it should develop iteratively -- as a carefully-reviewed patch for the existing code. We should take concrete, upfront steps to check the quality of the change and ensure that it still meets the original requirements. This review process should follow a clear schedule so that multiple stakeholders have the opportunity to participate in a timely way.
Switching to this approach was a significant change to our previous approach of dropping large significant code changes in the core product once or twice a year. Instead developments like the v4 api, the new contact layout editor and flexmailer have been done in extensions.
Less obviously, we have done a lot of work in core to support much smaller changes being done in extensions. A good example of this is the progression of ACL work. The original work on financial ACLs was approved prior to the LeXIM approach and was built into core. Later work was sponsored on ACLs for Cases and a generic hook approach was developed. When a new round of work on making financial acls work in reports started we were able to agree that this should be done in an extension and where reports had already been altered in core to support financial acls we started removing that code again.
One thing we do wish to achieve with extension is the ‘leap’ part and this can be challenging. A new CiviCase extension is nearly complete, but we are still making changes to case in core. The line item editor extension creates a technically much more sustainable way to manage line items on the contribution form but as part of LeXIM we dictated it should be done in an extension - meaning we are still dealing with changes in core AND in the extension that relate to line items on forms. This is just to say LeXIM is a process not a line we crossed.
Supporting functionality in extensions doesn’t mean we don’t change core - our core code was not written from the ground up to support extensions and one of the main reasons over the years people have wanted to make changes in core is because it is so hard to make the changes in extensions. Some specific things we have been working on to support extensions :
Custom data on any entity. Since CiviCRM 5.0 most entities support custom data via the api (the ones listed as not working are a mixture of ones that really don’t work & ones with test coverage issues). We have been extending this support to being exposed in front end form. Currently the forms for editing relationship type, price sets, membership types and financial type forms can support custom data fields for extensions. This has involved a lot of form standardisation and also makes the forms more amenable to other forms of altering by extension (as the template variables are defined & consistent)
Cleaning up BAO methods to support hooks correctly
Restructuring templates to be more alterable. This applied to the forms converted to support custom data and in some cases search forms
Increasing metadata driven forms. We have increasingly switched forms to add fields using the metadata-driven addField method rather than ‘the thing I hard-coded today’
A typical monthly release consists of a mixture of the following
- extension support fixes
- performance improvements
- maintenance & code improvements (code quality, tests)
- bug fixes
- compatibility improvements (e.g support php 7.2, mysql 5.7 & 8.0, Drupal 8, Wordpress and also various server configurations)
- user facing improvements (read write relationship perm, recurring contribution tab)
Product maintenance is the IM part of LeXIM. Over the past year I’ve been trying to carve out a concept of product maintenance that is separate from the generalised product management part of looking after CiviCRM and very much focuses on the ‘iterate by month’ part of the equation.
The cornerstones of product maintenance from my point of view are
Continuous integration - every change that is made is put through a https://test.civicrm.org/ test suite before it is merged. (Over the 4.7/5.x cycle we have increased from the 4453 unit tests in 4.6 to 7043 unit tests). Managing the CI suite & related issues is a significant part of the work the core team does.
Consistent release cycle - on the first Wednesday of each month the monthly release is made available. On the same day we also put out the release candidate for the following month. That rc release is available for people to download and test prior to the release. Any regressions identified will be fixed prior to the release. Other changes will not be made to the rc release (with some rare exceptions).
Identifying and fixing regressions. While we cannot prevent ever hitting a regression with a complex and aging codebase that has an incredible number of obscure features, we can mitigate the impact of them. This means that whenever we find there is a regression in the latest release we fix it quickly and ensure that the downloadable release at any moment in time is the most reliable release we have.
Actively reviewing incoming changes
You will note that I have NOT included bug fixing, except where it relates to regressions. There is an implicit assumption that if a bug has been in the 4.7/ 5.x series for a year or more then it has not been important enough for anyone to work on / sponsor a fix on and that speaks to its importance in general. By contrast the review queue represents bugs and other issue that HAVE been important enough for someone to put time into and supporting that effort is more important than addressing non-regressive bugs.
A couple of the items above deserve more discussion
Regressions happen. They happen in CiviCRM and they also happen in my multi-billion dollar Saas accounting software (as I experienced). However, if they are identified and fixed quickly we should be able to limit how many people are affected by the regressions and the impact on those who do hit them.
On the fixing side the answer is pretty obvious and the main thing that has changed since we moved from the 4.x series to the 5.x series is that we will do point releases within a month to address any regressions that are hit as soon as possible. In general if we release 5.5 and there is a regression in 5.5 we will put out 5.5.1. If the regression was in 5.4 or earlier then we will ‘probably’ put out a patch release version but we may wait for the next monthly code drop (in this example 5.6) after considering the perceived risk of both the bug and the fix, how long the bug has been in core (the more recent the more urgent we see it as being so a 5.4 regression is likely to get a 5.5.x code drop but a 5.0 regression would normally wait for a 5.6 drop) and how long until the next drop.
On the identifying side the keys are
For partners and other contributors to test the rc and do their own local development against the rc or master branch.
Regular monitoring of stackexchange for potential issues
Consistently asking questions for bugs to identify whether this is something that used to work in a recent version
‘Rewarding’ users who test the rc, are early adopters of new releases or generally keep their sites up to date with support. There are sites that make the business decision to let others find any regressions. Those who do stay up to date are helping us to identify any regressions and we should mitigate any downside for them by actively helping them. For example I may fix a bug that someone hits on the latest version whether or not it is a regression but if they upgrade from 4.6 and something used to work in 4.6 but doesn’t in the latest then I see them as having made a business decision not to stay up to date and the cost of fixing the issue is factored into their business decision.
This is probably one of the most controversial parts of my attitude to product maintenance. Basically I see the review queue as a reviewers to-do list and we should try to corale reviewers to review the things that in that queue. This approach means the PR queue needs to be triaged and culled and cannot be the place where things go to die. (Every software project needs a place where things can slide into oblivion if no-one is actively pushing them and I believe that is gitlab not github).
One of the key questions here is ‘do we need to review all the stuff that comes in - or should we just review those things that bubble to the top / that we care about’. My personal belief is that part of being open source is tending to contributions. Yes, every time we review something we take on risks. On the other hand when the review queue has been overrun I’ve later discovered multiple fixes for the same bug in there, even in one case for a regressive bug, so we are not making ourselves safer by ignoring people’s efforts to submit code.
The problem is - it’s a lot of work! Even if the ONLY thing the core team did was review incoming contributions, the time involved would exceed the funds the core team has. Code review is a time-consuming job that involves dealing with an issue you likely have no interest in but which transfers onto you some of the burden of dealing with any fall out. My rule of thumb is that by the time someone has submitted a pull request to submit a fix or change to core approximately 50% of the work has been done. So, if we accept that review is important we also have to accept that reviewer time has to be treated as being of high value.
While I wrote a longer document in the past and there is a description on this board the short version of how we place value on reviewer time is by triaging the PR queue to only have things that are review ready (including unit tests) in it and closing other PRs to be tracked in gitlab until they really are review ready (including if reviewer feedback is not responded to in a timely manner). As of writing the review queue is a bit out of control (105 PRs) and I’m finding that when I have time to do review I’m spending as much time finding something appropriate to review as to actually reviewing.
One issue that has come up as we have tightened up on product maintenance is how does product development fit. There are many features that other CRMs have that we don’t and there is a large amount of technical debt and problematic code that make it hard for us to improve and in many cases to attract and keep developers (one developer who moved on from CiviCRM specifically cited a particularly bad piece of code as one of his reasons :-().
My personal belief is that if we can get the flexible form builder off the ground then we can leap by extension past many of the technical and user experience issues that are holding us back. Deprecating large chunks of legacy code and code that was never written for extension intervention will significantly reduce the time cost of product maintenance.
In the meantime I believe we need a partially CT, partially volunteer group working on product maintenance according to the 5 pillars I listed above
Established monthly release cycle
Find and fix regressions quickly.
Learn from regressions
Treat the review queue as a triaged actionable to-do list