Earlier today at approx 6:00 am NZT (yes, we are early birds), we released CiviCRM v2.0-beta4. At approx 9:00 am, I saw a post from aaron about some missing files in the release. I downloaded and verified what aaron said and realized we had messed up big time :(. Our release czar piotr was offline and not reachable. Michal and I had to dive into the release code and figure out what was happening.
We pretty soon figured out that DAO (php files that we use to talk to the database that are auto generated from an xml schema description) generation was failing, however the script did not exit at this stage which is a bug. The process is quite sequential, and an error early on should abort the process. It took us a couple of hours to figure out the fix and test it. Testing was not as easy, since we had to simulate a release process but not actually release the code. Did not realize this, but svn operations are much faster on the local server than from a remote server, even though it was a url->url copy.
So while we were in the process of removing and creating beta4 multiple times to test, my itchy fingers managed to instruct svn to delete the v2.0 branch. That was a "oh my god" moment, what do we do now? We quickly remembered that svn was a version control system, and the branch was somewhere within the repository. A quick search for some other folks who made the same mistake gave us the need command (svn copy -r REV BRANCH_URL BRANCH_URL). We ran this on the server and we were back in business. We made sure our latest commits were in there. Things went smoothly after that, and we managed to get beta5 out there at approx 11:30 am or so. Not the fastest fix time, but we did manage to fix another beta issue (CRM-2776) in between. Dave was kind enough to do a manual download and test of the release.
A few things we need to do from this experience, to prevent this from happening in the future:
- We need to implement a test and validate procedure BEFORE pushing a tarball to sourceforge. We should initally hide the release.
- Ideally we should be able to run our unit and web unit tests on this release and all tests should succeed before the release.
- We should implement a checksum validation on the tarball and also implement a download and test release after it is uploaded to sourceforge.
- After the above, we can unhide the release for folks to download.
- We should check return status for the various scripts and abort as needed. Need to investigate what happens to an svn commit if the post commit hook fails.
- It did help that the michal and I knew the release code a fair bit and could quickly debug it. Always helps to have redundancy in the team :)
Overall it was a valuable experience for us. I wonder how other open source projects manage the above process?