Migrating a Legacy Kintera Database to CiviCRM

Közzétéve
2011-12-25 20:34
Written by
fen - member of the CiviCRM community - view blog guidelines

Similar to @annaleevk's story (Tales from a Blackbaud Kintera Conversion) earlier this year I was tasked with migrating a 60K contact Kintera database to CiviCRM.  To make matters more "interesting", the client had a home-grown database with mixed information, some defining new contacts and some adding information to the Kintera contacts.  I will not talk about this second database merge issue any further (perhaps I just want to put it all behind me).

After project definition and kick-off, 50% of our time was spent creating data dictionaries and mappings between the incoming tables and CiviCRM, while the other 50% was spent on script creation and data import.  Oh, and then there was the 50% spent on data cleaning and another 25% on re-mapping and merging that was not planned for to that extent at the beginning of the project.  Fortunately, as it became evident that data cleaning was going to become a major part of this project, our client agreed and asked how they could help.  I introduced them to Google Refine - a wonderful tool for working with messy data - which they used extensively to help make the data more coherent.  A valuable lesson learned in this process: when importing data from legacy systems into CiviCRM, expect that the data may need significant data cleaning, and engage the client whenever possible to help with the process.

The import process had several basic steps:

  • Get the data out of Kintera (they don't make it easy, but it is possible from their password-protected and somewhat hard-to-find export page).
  • With help from the client, create data dictionaries of the CSV files output by Kintera
  • Start working on the mapping into CiviCRM [see KinteraCreateTables.sql]
  • Pass the files to the client for initial cleaning with Google Refine
  • Import the CSV files into a MySQL holding database (I called this "kintera") [see KinteraLoadTables.sql]
  • Run the tables through some additional data cleaning [see KinteraCleanTables.sql]
  • Create tables that match exactly what needs to be imported (don't do joins, etc. in the import GUI)
  • Ask the client to validate the data import.
  • Didn't work right? Lather, rinse, repeat.


From another lesson learned with a previous client, the mapping to CiviCRM would include significant workflow changes from their Kintera/Custom experience, but when migrating to CiviCRM it is important to embrace "The CiviCRM Way."  Therefore, as the data to be imported into CiviCRM (after all the cleaning and mapping) fit nicely into CiviCRM's schema, my initial plan was to use the stock CiviCRM import GUI.  The servers would time out after about 2K records being imported, so I pushed the Apache & MySQL timeouts on my home server to the max and performed the initial imports there.

If I had to do it all over again, I'd definitely use more of the great data access methods offered by API v3 combined with simple drush-based scripts that can manipulate CiviCRM from the command line.  In fact, I did have to do it all over again, because before the site went live, we had to update the test databases with many months of updated or entirely new data.  For this process, I wrote several scripts, the most generic is import.drush.  (Please excuse the lack of comments - this was written to be used once and thrown away.)  Especially when lots of custom transformations must be made, this is the way to go.  I'm making extensive use of the facile testing, prototyping and logging facilities of drush in another data migration project I'm currently working on -- I'll be discussing one aspect of that in a future blog post.

Diving into large legacy database to CiviCRM database migrations is not for the faint of heart - I sometimes liken it to untangling a five-mile bundle of Christmas-tree lights.  But it's also rewarding to help a worthy client extricate themselves from proprietary databases and ad-hoc record keeping systems, replacing those with the open source freedom and ever-expanding feature set that CiviCRM embodies.

All of the scripts mentioned above are in a public SVN directory at https://svn.civicactions.net/repos/civicrm/scripts/loadcsv

(This article is a repost of an article posted on the CivicActions website with a few changes for the more CiviCRM-centric audience here.)
 

Comments

I think both the "story and learnings" as well as the script examples are really helpful for others when they need to travel these paths! In particular the recommendation to engage the client in the data cleaning process seems super useful.

Thanks for this - you pushed me to get my own blog up. Data cleansing is everyone's bugbear.

 

I'm pleased to see you got on with some of the more advanced features of the API in your code.