Another migration story - Using CiviMigrate 2 for Drupal 6 or 7

Published
2011-12-27 22:38
Written by

This month we did a fairly complex migration for a customer of about 90,000 contacts to CiviCRM. I have been using Migrate module to do migrations for a while now but this time for the first time I used version 2 of the migrate module. I put up a how-to-blog on our site just before Christmans but Fen has inspired me to get on with sharing it more widely.

 

Migrate allows you to run a specific number of items, to roll back or update items and is primarily intended to be used by drush. Data can be adjusted on the way in if you add a 'prepare' function and you can use the results from one migration in another migration job. The example module included in CiviMigrate shows contacts being created and contributions being linked to the resulting contacts.

 

Compared to my Migrate module v1 effort the v2 CiviMigrate is extremely light-weight and does very little extra of it's own. This has much to do with the fact that it uses APIv3 not V2.

 

Like Fen's experience our customer was keen and able to do considerable data cleanup but unlike Fen I did very little MySQL data manipulation and used the migrate module to do all the heavy lifting. I wrote a light-weight module called CiviMigrate which basically exposes the CiviCRM api to the migrate module. One nice thing about the migrate module is that it also allows you to create users so we were able to also import 15,000 users with very little extra effort (migrate has a mechanism for dealing with duplicate user ids if your pattern isn't always unique).

 

The thing I really liked about the migrate module is it provides a really good basis for discussing the data and working through it with your customer. We were working with data already in a mysql database although migrate supports other formats like csv, xml, Oracle etc. The Migrate UI allows you to see which fields you have mapped, which ones you haven't and to categorise other fields - e.g 'To discuss'.

 

You can get more information on the source & destination fields in the above picture by clicking on the link on your site. It will show you which field is mapped to which, which are unmapped and any comments or defaults you have added. There are more screen shots on our blog and the civimigrate module contains an example module (the source of the screenshots) so I won't add all the screenshots / go into much technical details here.

 

The destination fields displayed are those fields advertised by the relevant entity (ie. by the api getfields function). For the source fields I tended to use include all the fields from the base table relevant to a particular entity and add in fields from other tables / conditions as appropriate. Where there were large numbers of mappings - e.g. 25 different membership type abbreviations mapping to 10 different memberships I created special tables for that but for one or two I used the migrate prepare function and php based if then clauses.

 

The mappings are written in your own add-on module & look like

 

    $this->addFieldMapping('first_name', 'names_name'); 
    $this->addFieldMapping('contact_type')->defaultValue('Individual');

 

In this particular migration we had  4 different import jobs for a relationships as they decided that lines from both their source activities table, their main data table and their relationships table plus a somewhat complicated query all related to data that should be displayed as relationships.

 

The migration itself did wind up taking a while (contacts went in at only 600 per minute whereas the contributions were more like 2000 per minute) and I found I had to add the DAO->free(); line to CiviMigrate so obviously there are still leaks we haven't yet found. It was still memory intensive. I have run the import on the production site on one of our servers but our latest experience was that the production server had limited memory and lots of resource throttling so I found I had to run it on our development server.

 

Overall the process was a real success because the customer was able to see where the source fields were mapped to and took a very active role in creating new custom fields / deciding where to map them. I created a small UI to the mappings (it's mostly written in code) and the customer was able to adjust the mappings. I haven't added this to the main migrate module yet & it's fairly limited.

 

I did get a bit caught out when I found the address API didn't convert state abbreviations & didn't realise they were being skipped and I think the speed could be improved within the API (i.e. optimise some of the error checking - this is something we've started on). CiviMigrate has it's own check to look for existing contacts based on the external identifer but I suspect I didn't actually turn dedupe off & that won't have helped the speed.

 

But overall I found it a flexible approach that allowed for a lot of trial and error and also caused me to focus fixing efforts on improving the API rather than writing one-off code.

Filed under

Comments

While it's pretty easy to create custom API V3-based scripts for import, I like what I'm seeing in the CiviMigrate module.  Thanks for the case study and the pointer to your blog write up - I'll def look into using this next time.

And I agree that the address API needed some love.  Unfortunately (for the CiviCRM team and users) one of our local team had written some useful state name conversion routines which I reused rather than looking into how I might extend the API to do what I needed.  I'll work to be more cognizant of how I can support using/expending API routines in the future.

Hi,

 

I think things like address API need to stay simple & the functionality should be at the BAO layer but we need to work to add it where it is missing. The API just calls the BAO but not nearly as consistently as we might hope - I added some notes on this ticket

http://issues.civicrm.org/jira/browse/CRM-9386

Does CiviMigrate have any support for de-duping on import? This has been one of my biggest issues, as I may receive multiple databases to import, one with first_name=Fred and another with first_name=Frederick (with identical last names and email) which I want to map into the same contact.  And perhaps there is already a "Frederic" (mis-spelled in the initial DB) which I would want to map both of those into.

I've done some work to do this manually, but I guess my question is: can CiviMigrate access the de-dupe facilities in CiviCRM on a per-contact import basis?  (Or put another way, is there an API front-end to de-dupe?)

Yes, it can  - as in it uses the API which can. I didn't use de-duping as I prefer to get the customer to do it after the import where possible.

 

There is a param accepted by the contact API

 

  $params['dupe_check'] = TRUE ;

 

Which you can set in your your.. It will use the default rule. You can't choose which rule as yet.

Anonymous (not verified)
2012-01-28 - 02:24

Thanks for this it's been a great help. However I found that rollback did not work - there are a couple of typos in the bulkRollback() method of CiviMigrate's civicrm_api.inc. Lines 82 $uids and the instrument string on line 83.

fixed the things pointed out by chaps2

This is an old blog post, but I had written something quite similar. Eileen you actually helped me out while writing it by answering some API related questions. It has specific destinations for different entity types -- contacts, contributions, etc, and includes shortcuts and helpers to make mapping fields not found with the get_fields() as well as nested/related entities.

Sorry the documentation sucks! I used this for a salsa migration most recently, and I should be able to post that code as an example almost in it's entirety.

http://drupal.org/project/civicrm_migrate