We recently had one of our groups report that merging data was resulting in data loss. Specifically, when they merged two records, they noticed that the contribution records on the record that was deleted were not carried over to the record that remained.
I investigated and found the culprit: we were missing a foreign key constraint between the contribution table and the contact table. In fact, we were missing a lot of foreign key constraints in this database.
The reason the mising foreign keys caused data loss is because CiviCRM quite cleverly uses the existance of foreign keys to determine what data should be merged. With this method, the merge code doesn't have to be re-written every time a schema change is introduced. This approach strikes me as smart even though it had bad side effects in our case.
So - how did our schema get out of sync? This problem only affects our four oldest databases and I suspect these databases were ones I had to convert from myisam to innodb and I suspect I did not manage that properly. Or I may have powered through an upgrade without properly reviewing errors.
As I started researching how to fix this problem, I was quite pleased to find the mysqldbcompare utility which promises to identify differences between two databases and presents SQL statements to alter the second database so it will match the first. Perfect tool for the job. I created a fresh CiviCRM installation to use as my reference and got to work.
Then I wasted nearly two full days. Problems with this tool included:
- It outputs AUTO INCREMENT changes. If I run the ALTER statements it gave me, I would have set the AUTO INCREMENT counter to 1 on all my civicrm tables
- It fails to include ON UPDATE and ON DELETE clauses for the foreign key constraints. That means it is nearly useless for my purposes.
- It includes key drops, key additions and foreign key constraints on the same ALTER statement - yet MySQL seems to need to key changes to be complete before the foreign key constraints can be added.
I actually worked through most of these problems with ugly hacks before I discovered that if you muck with your table definitions too much you can cause the innodb engine to crash by simply executing an ALTER TABE statement. At this point, two full days in, I decided to change course dramatically.
There are a few proprietary tools that promise this functionality, but I wanted a free solution. I ended up writing a rather dramatic sledge hammer to do the job. It's a little nerve racking to run, but it does the trick.
Once we got our schemas in line, we werent' done. We already had a database full of orphaned records, however, now the orphaned records were officially recognized by the database. This condition led to SQL errors left and right (for example, the smart group that pulls in all records with a valid email address in civicrm_email blew up because it tried to populate civicrm_group_contact_cache with a contact_id that is not present in civicrm_contact). Immediately after fixing our schema, we had to run code that identified and deleted (or set to NULL) orphaned records.
That script is also included in our git repo.
The moral of the story is: if you vaguely remember converting your database from myisam to innodb many years ago, you may want to check to make sure you have your foreign keys in place. It's not longer just a good idea: missing keys will result in database loss.
Comments
There is also the Ensuring Schema Integrity on Upgrades wiki page with details on a different method of making sure all details of your schema are correct.
Thanks for the tip! I'll follow up and make some additions to that page.
Also - I've updated the git repo so now, if you are using drush, you can automatically updated the foreign keys only (without having to dump and re-create your database).
Thanks for the nice post and very nice contribution ...
We have to review so much old/unmaintained/incoherent databases when migrating customers to our service that I often wondered about a script (or an extension) that would automate this task by downloading an 'official' blank database from github and compare its structure with the current civicrm db. There are a few gotchas along the lines of custom fields, multilingual, and history logs but apart from that seems mostly feasible.
Could what you have already done on FK's be reused/expanded for the above?
Yes - definitely. What I've created is a start - but there are a lot of corner cases to sort out. I would welcome any patches you have to contribute.
Getting the "official" blank database, particularly in a format useful for comparisons, is a lot trickier than I thought, given the corner cases (like you've mentioned) as well as annoyances like AUTO INCREMENT counters and such.
You might also be interested in the php code that only handles foreign key constraints - which are very important to have in order to avoid the data loss.
I ran your utility on a copy of my database and found some inconsistencies that are fairly minor.
But I note the utility drops the triggers on the database. How can these be regenerated?
Jamie,
I just answered my own question. I regenerated the triggers by enabling an extension. Iguess one has to beware of enabling/disabling extensions with side-effects,
That's a mistake in the script. It should re-create the triggers. I'll work on a fix. Thanks for the feedback.
I just added instructions for rebuilding the triggers. I also wonder if deleting the triggers is over kill and not actually necessary.
For WordPress users of CiviCRM you should remember to use a backup solution that will preserve foreign keys.
BackupBuddy by iThemes is one of the most popular backup solutions for WordPress and needs to be properly set up to backup the foreign keys.
Use Commandline as the backup method rather than PHP since the PHP backup does not preserve the foreign keys.