Published
Sunday, January 2, 2011 - 11:29
Written by

 

We recently completed a project with San Francisco Baykeeper, a pollution watchdog in the Bay Area. They asked us to help them move from an expensive yearly contract with Blackbaud Raiser's Edge to the libre and "free kittens"* CiviCRM.
 
Never having worked directly with Raiser's Edge (RE), we relied on Eliet Henderson, Baykeeper Development Director, to help us get our head around all of their data. She exported the RE database as CSV files, grouped by Individuals and Organizations. Each grouping then had subgroups consisting of relationships, gifts, recurring contributions, notes, address information, etc. With all of the export files generated we began to configure CiviCRM and build the tools that would handle the import.
 
Here's what we were dealing with:
  • 35: Number of custom fields
  • 42: Number of CSV files
  • ~19,000: Number of contribution records
  • ~42,000: Combined columns in all CSV files
  • ~123,000,000: Combined size (in bytes) of all CSV files

 

At the beginning, we picked a group of contacts that had unique characteristics. Some had multiple addresses, some had recurring contributions, some had a detailed activity history. By picking a subset of contacts, we could follow them from start to finish and help validate accuracy of the import. To be able to compare one-to-one RE and CiviCRM.

 
We used our colo VPS (Proxmox) to manage all of our sandboxes. With the ability to tear down and rebuild a sandbox in a matter of minutes, it gives us and the client a living, breathing development environment. Changes are pushed to the sandbox with git and a few scripts to load the database. During a one-week sprint, we might reload the sandbox dozens of times. We also use a similar set of tools when cloning production environments to a sandbox, except that process is automated with rsnapshot and drush.
 
While prototyping our import code, one approach we pursued was to coalesce all of the CSVs into one massive file and provide mapping rules for the columns. Our language of choice for prototyping, especially when data munging, is Python with its built in REPL. The code from the prototype is still available for anyone that wants to take a peak but it's a bit messy. There's nothing special about it: it simply reads a text file that defines a key/value pair (column name/description) and builds a series of dictionaries in memory reading through each of the files. 
 
We got as far as importing contacts before we abandoned this approach in favor of writing a one-off import tool. It became too tedious trying to build a single map for that many columns. Even with a mostly automated script to generate the map, we spent a lot of time troubleshooting, tweaking and running into memory constraints trying read/write a single CSV that size.
 
The solution we settled on was to write a modular, albeit Baykeeper specific, tool to import by topic (i.e., contact information, locations, activities, etc.) reading in the relevant CSV files, importing, and saving a snapshot. The process went something like this:
  1. Import basic contact information, locations (addresses, phones), activities, notes, tags, groups and custom fields for individuals and organizations
  2. Import relationships individuals and organizations
  3. Import pledges (recurring, one-time gifts, soft credits)
  4. Import contributions
 
At each step, we took a snapshot of the database with mysqldump so we could pickup where we left off if there was a problem or if we needed to repeat a step during testing.
 
For a majority of the tasks, we were able to use the CiviCRM Public API:
civicrm_activity_create()
civicrm_contact_create()
civicrm_contact_relationship_get()
civicrm_event_get()
civicrm_group_add()
civicrm_group_get()
civicrm_location_add()
civicrm_note_create()
civicrm_relationship_create()
civicrm_tag_create()
civicrm_tag_get()
 
For the data where we couldn't use the API, we used a combination of BAO calls:
CRM_Contribute_BAO_Contribution::create()
CRM_Pledge_BAO_Pledge::create()
 
and straight SQL:
CRM_Core_DAO::executeQuery()
for things like soft credits, custom fields and pledge payments.
 
RE can track an incredible amount of detail for contacts, relationships and contributions. Determining the best way to map these details into CiviCRM was difficult, and by the far the most challenging import we've worked on.
 
This is a subset of features in RE that Baykeeper wanted to replicate in CiviCRM:
  • Relationships to non-constituents
  • Multiple soft credits on pledges and gifts (automatic or manual)
  • Multiple payments and installments (regular and irregular frequencies)
  • Third party pledge payments and anonymous contributions
  • Storing change of address information
  • Constituent relationships
 
Storing a relationship to a non-constituent doesn't necessarily make much sense and is not something CiviCRM supports "out of the box." Our recommendation to Baykeeper was to create contacts for the non-constituents and add them to a non-constituent group, then set the relationship. It was ultimately decided to simply add them to the notes field. This works out well for Baykeeper, because the non-constituent relationship data is only for reference, not something used for data-driven decision making. A perfect use case for the notes field. They've also standardized around other aspects of notes, where the title of the note can be one of "Account Management," "Biographical," "Non-constituent Relationship," and the note body populated with who, what, when.
 
The CiviCRM schema allows for multiple soft credits but only one will appear in the interface. We didn't have time to completely spec it out during this project. At a cursory glance it seems like it wouldn't be too difficult. This has been mentioned previously in the forums and would be a good community feature to work on, possibly through Make it Happen.
 
Recurring pledges, payments, and installments was tricky and took us a few sprints to get through. In Raiser's Edge, we identified 4 different installment frequencies: "Single Installment," "Monthly," "Quarterly," "Annually". Each of those were mapped to the CiviCRM frequency: "day," "month," "year" and adjusting the interval as necessary. Adding frequency and interval along with a slew of other parameters can all be passed to CRM_Pledge_BAO_Pledge::create(). Once the pledges payments have been calculated, they can be updated with actual amounts and dates.
 
Third party pledge payments and anonymous contributions were both handled with custom fields. We opted for a contact reference field for third party payments and a checkbox for anonymous contribution/pledge.
 
Constituent relationships was also problematic for us. RE can have multiple, independent inverse relationships. For example, it is perfectly valid to have an A->B relationship of "Father" and "Son" and a B->A relationship of "Youngster" and "Father". CiviCRM will only allow "Father" to be used in one relationship. We assembled a lookup table of converted (and simplified) RE relationships. In the example above, we created one A->B relationship "Parent" -> "Child". We don't view this as a limitation of CiviCRM, in fact it makes it easier to manage common relationships. One way to represent Parent -> Child in CiviCRM, rather than an unlimited number of ways in RE. It's also important to make sure RE is exporting the relationship from the view of the contact (A) not (B). The first few attempts at importing relationships resulted in odd mismatch of relationship types, since RE exports something like:
1, John Smith, Son, Joe Smith, Daughter, Jane Smith
 
John Smith is obviously the parent of Joe and Jane, but depending on who was imported first the relationship will get reversed. This isn't a bug in CiviCRM, but relies on the code to pass in the correct relationship and type. By telling RE to export the relationship from the point of view of the contact, it reduces the amount of juggling you have to do in code. So the new exported row would look like:
1, John Smith, Parent, Joe Smith, Parent, Jane Smith
 
Now that we had relationships, contributions and pledges imported, we spent the final week reviewing the data and making a few tweaks to custom fields and reporting. The Drupal site was installed and configured by one of their staff and we moved them to Linode for hosting where they could take advantage of our Drupal/CiviCRM optimized OS and LAMP template. One week later, the DNS switch was made and they started the next iteration of their 20+ year old organization: online donations, events, campaigns, content, user management from an integrated CMS/CRM.
 
A whole book could be written about the import process we used to move from Raiser's Edge to CiviCRM. We also wrote a blog post on our site outlining the project, as we do for all of our client work. We're happy to answer any questions or provide more detail. This post just touches on a few of the issues we ran into and the approach we took. Without the powerful CiviCRM core, functional API, and community of developers and users this project would not have been possible on the same scale.

 

Comments

Thanks for sharing!

Interesting, I recently did a similar project with recurring payments (considered a backbone of any UK fundraising database and surprisingly missing in CiviCRM).

I looked at pledge, but I didn't like the way it worked, importantly, it didn't allow open-ended pledges ("here's £5/month from now on"), and the UI for seeing payments inside a pledge is troublesome when you're talking a hundred or so payments for a long standing supporter.

I ended up using the civicrm_contribute_recur table. Basically I used the processor_id and trxn_id field (both long varchars) to store information that would be needed to identify incoming payments. So for Direct Debits, this is a unique reference number; for Standing (Banker's) Orders, this is a sort code, account number and possibly a reference. In those fields also I could store other information that would be used as the template for civicrm_contribute records.

I wrote code for a new Recurring Payments tab, and reconciliation routine that takes monthly .csv files from the bank and Direct Debit service provider and matches them up, creating contribution records as it does it.

This gives my client an easy way to import the hundreds of transactions each month, and a rational, paged, list of contributions on each contact's record, that can be searchable in the normal way.

Anyway, good to hear other people doing similar hacks! And good to hear Civi' taking on Raiser's Edge as it's considered "industry standard" here in UK but I've not been that impressed with it (compared to its price).

Hi Rich,

Do you know of a direct debit provider in the UK that has a decent API that you could use in recurring payements?

Would be great if you could share the code you used for the direct debit integration.

Am working on a similar integration for DD here using a custom import: https://github.com/michaelmcandrew/gp/tree/master/modules/custom/gpew_cu.... It requires manual feeding of DD files like yours.  It uses custom data table rather than civicrm_contribute_recur (nice idea).  Definitley a stop gap at the moment.  I'm interested in developing a proper DD integration.  Happy to pool resources on that if you have ideas / time / money.  You want to get in touch / post your contact details?

Michael

Hi,

 

We tend to use the api as soon as we have a "not completly trivial" import. Just for the sake of being able to run it several time without the pain induced by the import wizard, if nothing else ;)

 

Think eileen has been working on a pledge API. In general, when you see a missing api, please discuss it on the forum, either someone is working on it already, or your contribution would be more than welcome.

 

X+

X,

 

When you use the word "you", we assume you mean "community at large." :)

 

During this project we found bugs in the code base, reported them and supplied patches for most. We always report bugs when we see them and *almost* always post a patch. Being active in the forums or writing an API isn't the only way to get involved and isn't always practical. The resources of the person/team/organization for client work must be balanced with their commitment to open source; do what they can knowing the "tangible" contribution won't be the same for everyone.

 

Yep - I think that's the right interpretation of the word 'you'. We just don't like to miss a chance to try to tell people about how to use the API :-)

 

Great Blog BTW - I think I forgot to mention that

Indeed, thanks for the clarification. my comment was

1) to point out to other readers that patches welcome, and that adding a new api is not difficult.

2) to promote Eileen pledge api

3) And to thanking you (as in the dude typing on the keyboard, not the community at large :) of your post.

 

Err. the 3) might have been missing it, so:

 

Great Blog BTW, thanks for sharing ;)

Yep - the pledge API is OK to use - I think we decided to post it into v2 of the API once I've written some doco (must do that). I actually used a pledgePayment api to link the payments too. The contribute wrapper in civimigrate allows you to pass in a pledge ID (or membership ID) when you map up a contribution & the two get linked and the payment status gets calculated.

 

I used drush w views & migrate module with an extra civimigrate module which I wrote for the purpose but quite a bit in common between the 2 imports.

 

Also note you can add soft credits using contribute api

$params['soft_credit_id']

CivicActions is embarked upon a similar migration project with YouthBuild USA, a youth and community development program that simultaneously addresses several core issues facing low-income communities.  They've asked us to migrate their Kintera and proprietary SiteView databases to a Drupal/CiviCRM environment. I would greatly appreciate any additional suggestions/advice you may have, as well as a peek at the Python (and any other) code you developed that may help us help YouthBuild get on their way to a free-software-based alternative to Kintera.
 
Our current plan is to create a series of Drush scripts to help with data import, invoking the CiviCRM Public API as well as any needed direct database interactions. We will (of course) contribute wherever we can during the process and certainly plan to blog and/or create a case study to both highlight CiviCRM and help others that follow.
 
Kintera has been less than helpful in supplying database dumps, but they do have a 'contact export' feature from which I've pulled a set of 15 CSV files that contain about 500K rows of data.  I have created a few quick-and-dirty (are there any other kind?) Perl scripts and expect to be using the very cool, BSD-licensed Google Refine tool to help clean up the data.  (I just downloaded this app earlier today after watching the videos; this looks like an excellent tool to clean and massage "dirty data" - check it out!)

Fen,

 

Google Refine is certainly worth looking into... it crossed our radar a couple of months ago after reading about it on HN. Being able to load an instance of CiviCRM somewhere in Google space using GAE would be pretty sweet too. :)

 

Happy to share our code. Good luck with the Kintera migration.

We are about to begin a migration for a small Educational group with around 10,000 clients currently in BB RE.  

This is so helpful and a tribute to OS.  

Thanks very much