recently completed a project with San Francisco Baykeeper
, a pollution watchdog in the Bay Area. They asked us to help them move from an expensive yearly contract with Blackbaud Raiser's Edge to the libre and "free kittens"* CiviCRM
Never having worked directly with Raiser's Edge (RE), we relied on Eliet Henderson, Baykeeper Development Director, to help us get our head around all of their data. She exported the RE database as CSV files, grouped by Individuals and Organizations. Each grouping then had subgroups consisting of relationships, gifts, recurring contributions, notes, address information, etc. With all of the export files generated we began to configure CiviCRM and build the tools that would handle the import.
Here's what we were dealing with:
35: Number of custom fields
42: Number of CSV files
~19,000: Number of contribution records
~42,000: Combined columns in all CSV files
~123,000,000: Combined size (in bytes) of all CSV files
At the beginning, we picked a group of contacts that had unique characteristics. Some had multiple addresses, some had recurring contributions, some had a detailed activity history. By picking a subset of contacts, we could follow them from start to finish and help validate accuracy of the import. To be able to compare one-to-one RE and CiviCRM.
We used our colo VPS (Proxmox
) to manage all of our sandboxes. With the ability to tear down and rebuild a sandbox in a matter of minutes, it gives us and the client a living, breathing development environment. Changes are pushed to the sandbox with git and a few scripts to load the database. During a one-week sprint, we might reload the sandbox dozens of times. We also use a similar set of tools when cloning production environments to a sandbox, except that process is automated with rsnapshot and drush.
While prototyping our import code, one approach we pursued was to coalesce all of the CSVs into one massive file and provide mapping rules for the columns. Our language of choice for prototyping, especially when data munging, is Python
with its built in REPL
. The code from the prototype is still available for anyone that wants to take a peak but it's a bit messy. There's nothing special about it: it simply reads a text file that defines a key/value pair (column name/description) and builds a series of dictionaries in memory reading through each of the files.
We got as far as importing contacts before we abandoned this approach in favor of writing a one-off import tool. It became too tedious trying to build a single map for that many columns. Even with a mostly automated script to generate the map, we spent a lot of time troubleshooting, tweaking and running into memory constraints trying read/write a single CSV that size.
The solution we settled on was to write a modular, albeit Baykeeper specific, tool to import by topic (i.e., contact information, locations, activities, etc.) reading in the relevant CSV files, importing, and saving a snapshot. The process went something like this:
Import basic contact information, locations (addresses, phones), activities, notes, tags, groups and custom fields for individuals and organizations
Import relationships individuals and organizations
Import pledges (recurring, one-time gifts, soft credits)
At each step, we took a snapshot of the database with mysqldump so we could pickup where we left off if there was a problem or if we needed to repeat a step during testing.
For the data where we couldn't use the API, we used a combination of BAO calls:
and straight SQL:
for things like soft credits, custom fields and pledge payments.
RE can track an incredible amount of detail for contacts, relationships and contributions. Determining the best way to map these details into CiviCRM was difficult, and by the far the most challenging import we've worked on.
This is a subset of features in RE that Baykeeper wanted to replicate in CiviCRM:
Relationships to non-constituents
Multiple soft credits on pledges and gifts (automatic or manual)
Multiple payments and installments (regular and irregular frequencies)
Third party pledge payments and anonymous contributions
Storing change of address information
Storing a relationship to a non-constituent doesn't necessarily make much sense and is not something CiviCRM supports "out of the box." Our recommendation to Baykeeper was to create contacts for the non-constituents and add them to a non-constituent group, then set the relationship. It was ultimately decided to simply add them to the notes field. This works out well for Baykeeper, because the non-constituent relationship data is only for reference, not something used for data-driven decision making. A perfect use case for the notes field. They've also standardized around other aspects of notes, where the title of the note can be one of "Account Management," "Biographical," "Non-constituent Relationship," and the note body populated with who, what, when.
The CiviCRM schema allows for multiple soft credits but only one will appear in the interface. We didn't have time to completely spec it out during this project. At a cursory glance it seems like it wouldn't be too difficult. This has been mentioned previously in the forums and would be a good community feature to work on, possibly through Make it Happen
Recurring pledges, payments, and installments was tricky and took us a few sprints to get through. In Raiser's Edge, we identified 4 different installment frequencies: "Single Installment," "Monthly," "Quarterly," "Annually". Each of those were mapped to the CiviCRM frequency: "day," "month," "year" and adjusting the interval as necessary. Adding frequency and interval along with a slew of other parameters can all be passed to CRM_Pledge_BAO_Pledge::create(). Once the pledges payments have been calculated, they can be updated with actual amounts and dates.
Third party pledge payments and anonymous contributions were both handled with custom fields. We opted for a contact reference field for third party payments and a checkbox for anonymous contribution/pledge.
Constituent relationships was also problematic for us. RE can have multiple, independent inverse relationships. For example, it is perfectly valid to have an A->B relationship of "Father" and "Son" and a B->A relationship of "Youngster" and "Father". CiviCRM will only allow "Father" to be used in one relationship. We assembled a lookup table of converted (and simplified) RE relationships. In the example above, we created one A->B relationship "Parent" -> "Child". We don't view this as a limitation of CiviCRM, in fact it makes it easier to manage common relationships. One way to represent Parent -> Child in CiviCRM, rather than an unlimited number of ways in RE. It's also important to make sure RE is exporting the relationship from the view of the contact (A) not (B). The first few attempts at importing relationships resulted in odd mismatch of relationship types, since RE exports something like:
1, John Smith, Son, Joe Smith, Daughter, Jane Smith
John Smith is obviously the parent of Joe and Jane, but depending on who was imported first the relationship will get reversed. This isn't a bug in CiviCRM, but relies on the code to pass in the correct relationship and type. By telling RE to export the relationship from the point of view of the contact, it reduces the amount of juggling you have to do in code. So the new exported row would look like:
1, John Smith, Parent, Joe Smith, Parent, Jane Smith
Now that we had relationships, contributions and pledges imported, we spent the final week reviewing the data and making a few tweaks to custom fields and reporting. The Drupal site was installed and configured by one of their staff and we moved them to Linode for hosting where they could take advantage of our Drupal/CiviCRM optimized OS and LAMP template. One week later, the DNS switch was made and they started the next iteration of their 20+ year old organization: online donations, events, campaigns, content, user management from an integrated CMS/CRM.