Published
Wednesday, July 28, 2010 - 03:29
Written by

When working in setting up and developing websites (among other things) we often have to choose between getting stuck in and getting stuff done the known way or trying to invest time in coming up with a more automated way or efficient way of doing things. Usually at some point we become aware of whether we made the right choice – and I find that moment is normally accompanied by a certain sinking feeling.

I recently made the decision to try to follow a new path to do a CiviCRM data import by writing an extension for the Drupal Migrate module. I wrote a blog when I started and although the feedback wasn’t entirely encouraging I decided to give it a go anyway – probably pig-headedness.

As the meat of the migration was 90,000 pledges with 140,000 associated contributions I focussed on these first and set up an automated import (drush) that allowed me to import the contacts, addresses, pledges and contributions (many times over as we tried to figure out the significance of the various fields in the original database). I also imported a table of 27,000 ‘CODES’ as tags against the relevant contact.

After many variations on the import the customer said that, yes, we had the pledges ‘right’ but we needed to look at the ‘CODES’. Each line in the codes table had a contact id from their old system and two other fields. The customer told us that of 184 different combinations of values in the CODES table only 34 needed to be imported but that they should be imported as a mixture of activities, custom data, tags, relationships and core fields. That was the moment when I didn’t get a sinking feeling – I was able to add these requirements to my automated installation without writing a single word of code.

Here are the steps in the import as I am currently running it (still tweaking & at this stage our data is a month out –of-date and will be purged before we re-import the final dataset)

1) import 135 organisations
2) import 27600 main contacts. Fill in current employer field with id from orgs in previous step if relevant
3) Import 5000 secondary contacts. Fill in current employer field with id from orgs in previous step if relevant
4) Create 5000 relationships between primary & secondary contacts
5) Create employer relationships between primary and secondary contacts and orgs where relevant
6) Import 361 workers as contacts.
7) Import 3000 additional addresses against first contact
8) Import 23000 phone numbers against primary contacts (I did this separately to the main import for two reasons – so I could use a hook to format the phone numbers and so I could prevent duplicates being created as the Location add api is a bit special)
9) Import 90,000 pledges. This included 40,000 that were really one off payments but we brought them as pledges as they got lost amongst the contributions otherwise
10) Import 100,000 pledge payments from ‘Billed’ table. Soft credit to related contact. Custom field reference to worker
11) Import 40,000 pledge payments from pledges table (one-off pledges). Soft credit to related contact. Custom field reference to workers created in previous step
12) Import 112 payments that are anomalies (are basically in the wrong table in the original data)
13) Import 463 tags based on ‘major donor’s table
14) CODES table – set custom field X =1 where code X exists
15) CODES table – set contact to ‘do not phone’ where CONTACT = NP
16) CODES table – set contact to ‘do not email’ where CONTACT = NM
17) CODES table - set contact to ‘do not email and don’t email’ where CONTACT = NC
18) CODES table – add ‘annual’ tag where CONTACT=NA
19) CODES table – add 4 different tag variants based on fields
20) CODES table where WORKER=xxx create relationship with worker with that code
21) CODES table - create activities for contacts with 'helped' field - populate activity's custom fields based on value
21) Another 10 variants on CODES table as relationships, tags, custom fields – you get the picture
22) Run an update to cancel 1000 pledges based on a field in the customer's data

Performance
This method is suitable for this size of import if you have a reasonable server. It is probably not appropriate for really big imports. For smaller imports you would be able to use this method through your browser without needing drush.

Contribution imports are the slowest and slow down as they go along to about 500 per minute (or even slower!) by the end of the import. I presume this is in part due to re-calculating the pledge status each time. Faster items go in at about 2000 per minute – but that still means an hour or more to bring in 100,000 pledges. The whole import takes about 6 hours I think and sometimes I have to restart it once (when I get up!). Often when I'm tweaking I just run updates or re-run parts of it. To start it I run

drush migrate-import –all

Performance on deletions is appalling & I generally just truncate all the tables in the DB instead.

Custom Code

In order to do all this I wrote about 30 lines of custom code over & above the civimigrate module and the pledge APIs (both ALPHA):

1) Hook to re-format phone numbers
2) Hook to fill in ‘first_name’ & last_name fields as ‘not supplied’ after applied some rules
3) Hook to set the soft-credit to go the employer rather than related contact if one exists

So what is Civimigrate?
Civimigrate is basically a bandaid between Views and the CiviCRM api. Using the combination of Table Wizard, Views and Migrate modules (Schema also required) CiviMigrate allows you to map fields from Views to the $params array which is passed to the various CiviCRM apis. You are able to run hooks before or after the api is called. Migrate module manages a bunch of useful things like exposing it to drush, retaining mappings & error messages from the import and allowing you to delete & update previously imported items (you need to pass in an id to do an update or deletion – ie. For a contact you need to pass in the contact ID if you decide you want to update all your contacts with an extra field – e.g. I had to do an import to add the ‘Plus 4’ codes.

So how does it work?

Prerequisites
- Table Wizard, Views Migrate, and Schema modules. Preferably drush
- You need to understand views because I am going to assume that you do here
- Civimigrate module – still very ALPHA – here https://svn.fuzion.co.nz/repos/fuzion-code/trunk/drupal/modules/custom/
- Data table/s to import from in your MySQL database WITH a primary key. Table Wizard allows you to upload a csv to create a table but if you do this you must add a primary key and empty your views cache. Really long table names can upset things (about 32 characters)
- If you want to do pledge imports you need the pledge APIs – There aren’t any in Core yet but I have Alphas here: https://svn.fuzion.co.nz/repos/fuzion-code/trunk/civicrm/core/CRM/api/v2/

Table Wizard & relationships – what you need to understand

Make sure this makes sense to you before continuing. It is somewhat confusing but it is crucial if you can wish to run sequential imports that are connected to each other – ie. If you wish to run a contact import & then import contributions against the created contacts. (or in my case pledges against the contacts & contributions against the pledges)

Table Wizard automatically creates a view off any table in your MySQL database. It also allows you to expose fields from that table as foreign keys. Table wizard then allows you to create a relationship from this key to another table. I’m going to assume that we are dealing with a ‘old_contacts’ table with a unique key ‘personID’. And that there is an ‘old_contributions’ table which holds references to this ‘personID’

When you do a migration using migrate module a table is created called migrate_[migration name]_map. This map holds a ‘sourceid’ and a ‘desid’. Once you have run the import migrate will populate this table with mappings of your old ‘personID’ to your new civicrm contact_ids.

Using table wizard you can create a relationship between the personID in the old contributions table and the source id in the migrate map. Now that this relationship exists you can go to the view of your contributions table and add in the ‘destid’ field. This field is your contact_id and you should call it that. Now when you import your contributions your dataset (your view) will hold the contact_id the contribution should be matched against.

You will also wind up creating a relationship from your old_contacts table to the migrate map so you can add the contact_id to your view for updates & deletions.

I used these sorts of relationships also to create relationships in CiviCRM – e.g. with the case worker based on the ‘CODES’ table and to fill out soft-credit fields and custom data contact reference fields.

How do you do the migration
You're going to hate me but I'm not going to tell you - at least not right now. This is already REALLY long and there are some great blogs out there on the migrate module

It all just sounds like a bit of a pain
Yes, but actually migrations are a bit of a pain. Moving around the front end of drupal to create views & relationships is slower for some developers than writing code but it’s a lot easier to leave and come back to then code is because it’s visual and you don’t have to get your head into it in the same way. Also, you are often creating front end civi structures at the same time so it kind of works from that point of view

Why not just use the Civi Import
The Civi import is great. It’s much more intuitive than this method. But you have to prepare the data before you import it. They both have different places. This is a good option if you are going to do lots of trial runs to get it right. Also, you can mix & match as this will sync with existing contacts (it uses the dedupe code).

Is it re-creating the wheel?
Not really – it’s just linking the Civi API with drupal tools.

Caveats
Many! This code is extremely alpha – it only works as much as I needed it to. I have no idea whether the participant, member, groupcontact & event imports work – I just did a copy & paste on them while I was creating the activity import. They represent about 10 minutes work between them (but the activity one did work off the bat so maybe...)

Comments

this is great!!

"There is no plan for future development of Table Wizard - it is being deprecated going forward in favor of the Data module - Data provides the same basic functionality as Table Wizard, plus additional schema manipulation APIs."

FYI

Ergh - something else to get my head around - when I looked data didn't seem to have a UI but I only looked quickly

So that is how you have been doing it - and I sit just across the table from you ;-)

This is really great, Eileen. As D7 is coming out soon with significant improvements in its data layer, I'd been thinking of trying to do something similar at that point.

Thanks so much.