Civi-migrate - proof of Concept

Published
2010-06-05 02:31
Written by
So, amongst all the discussion of import methods lately I just wanted to flag another possible approach - creating a CiviCRM hook module for the Drupal migrate module There are a bunch of great blogs out there on how to use the table wizard module with the migrate module to import data from various mysql tables or views into Drupal nodes / users / taxonomies / content types - for example: http://www.lullabot.com/articles/drupal-data-imports-migrate-and-table-wizard The migrate module has a bunch of hooks to allow you to use it for other forms of migrations. I rattled up the module / code pasted at the end of this blog in a couple of hours as a proof of content for using this approach to CiviCRM imports. The code I threw together just offers up civicrm_contact table fields but I think there must be some clever ways to use existing import tools rather than this rudimentary approach. The really nice thing about using this approach is that it constructs an array or object (in this example the $params object) based on the front-end configured mappings and then additional hooks have an opportunity to re-factor this $params object (e.g. re-parsing address fields) before the $params array is passed to the contact create api I think there is lots of potential here - especially since migrate module: - already interacts with drush - allows you to specify how many contacts to import at a time - non-developers can make changes without needing the code to change (this can be a problem with a scripted solution if you have both developers & non-developers involved) - allows migrations to be reversed or updated as you tweak it - provides error reporting - potentially allows you to use related tables as your source data - ie. it should be possible from what I understand to create an import that imports contacts with associated contributions from multiple tables - http://drupal.org/node/591776#comment-2107050 - I haven't worked through this yet If we wrote a really good migrate hook module then all we would need to do to customise our imports is write hooks to massage aspects of the data. Obviously this is drupal centic approach but I think most of the people looking at big scripted migrations are doing it in drupal. To get this code working you will need the modules: - table wizard (tw) - schema - migrate - views - view ui (recommended) You will also need to have a source mysql table to use with a primary key. You need to add this table in table wizard and analyse it before you can go to migrate module & create a content set. The blog by lullabot or the docmentation on migrate should help here ** - NB - I struggled to find a useful place to post this as a zip as I didn't seem to be able to add a file to the wiki page****** Code for module civicrm_migrate migrate.info
****************************
; $Id$
name = Fuzion Migrate CiviCRM
description = Add on for Migrate module to migrate into CiviCRM
core = 6.x
package = AA Fuzion
version = 0.0
dependencies[] = "migrate"
dependencies[] = "civicrm"
************************************ migrate.module **************************************
 t('CiviCRM Contact'), 'contribution' => t('CiviCRM Contribution'));
  return $types;
}

function civicrm_migrate_migrate_import_contact($tblinfo, $row) {


  civicrm_initialize( );      
  require_once( 'api/v2/Contact.php' ) ;

  //section copied from example
  $params = array();
  // Initially populate the new object according to the mappings
  // this is a standard bit of code from the example
  foreach ($tblinfo->fields as $destfield => $values) {
      if ($values['srcfield'] && $row->$values['srcfield']) {
      $params[$destfield] = $row->$values['srcfield'];
    }
    else {
      $params[$destfield]  = $values['default_value'];
    }
  }

  // Give other modules a shot at manipulating the object
  $errors = migrate_destination_invoke_all('prepare_contact', $params, $tblinfo, $row);

  $success = TRUE;
  foreach ($errors as $error) {
    if ($error['level'] != MIGRATE_MESSAGE_INFORMATIONAL) {
      $success = FALSE;
      break;
    }
  }


        
      if ($success) {
        $result  = giantrobot_civicrm_contact_add($params);  

   // $newid = example_sub_save($sub);
    // Call completion hooks, for any processing which needs to be done after node_save
    $errors = migrate_destination_invoke_all('complete_contact', $params, $tblinfo, $row);

    $sourcekey = $tblinfo->sourcekey;
    migrate_add_mapping($tblinfo->mcsid, $row->$sourcekey, $newid);
  }
  return $errors;
  
        
}

function giantrobot_civicrm_contact_add($params) {
  $params['dupe_check'] = TRUE ;

 
  $contact = civicrm_contact_add($params) ;
 
  if ( !civicrm_error( $contact ) ) {
    // for clarity
       return $contact;
  }
 
  else {
     return;
    // let's see if we have multiple matches
    if ( stristr($contact['error_message'],'Found matching contacts') ) {
      // if so, we'll get the lowest contact ID and update them
      $contact_ids = explode(',',$contact['error_data']) ;
      sort($contact_ids) ;
      $contact_id = array_shift($contact_ids) ;
      if ( (int)$contact_id > 0 ) {
        $params['contact_id'] = $contact_id ;
        $params['dupe_check'] = FALSE ;
        $contact = civicrm_contact_add($params);
        // for clarity
        return $contact ;
      }
      else {
        // some unlikely civicrm_error which gave us a non-numeric
        // contact_id
      }
    }
    else {
      // not multiple duplicates - some other civicrm_error
    }
  }
  // we didn't handle update to first dupe; this is either a
  // successful add of a non-dupe, or a civicrm_error

  return $contact ;
}





function civicrm_migrate_migrate_fields_contact($type) {

  $sql = " SHOW COLUMNS FROM civicrm_contact ";
  $contactFields = db_query($sql);

  while ( $field = db_fetch_array( $contactFields) ) {
   $fields[$field['Field']] =  $field['Field'] ;
  }  

  
  return $fields;
}

Filed under

Comments

NB - I should note I'm not suggesting this approach is substitute for the CiviCRM import tools but rather a tool in the toolkit for complicated migrations - i.e it does part of the job that you might otherwise write a script to do & gives a good interface ( a hook) for intercepting & modifying the import

"NB - I should note I'm not suggesting this approach is substitute for the CiviCRM import tools"

I would. Well almost. Great integration with Migrate/Table Wizard would really kick ass and be far better than any of the existing import options. However there's two challenges that I see:

You'd have to break the DRY principle.
http://en.wikipedia.org/wiki/Don%27t_repeat_yourself
You'd need to duplicate the logic of the internal CiviCRM API within your table wizard hooks to account for things like adding entries to the log tables, firing CiviCRM hooks, and performing any special logic that normally happens when you create a contact/contribution/etc.

Last time I used Table Wizard/Migrate modules (which admittedly was a while back) there was a big limitation where all relationships had to be one-to-one (ex. if you have a row in your import that has multiple taxonomy terms you can only import one of them. You need to come up with another solution (probably some custom hooks) for the rest). Not sure if this limitation has been overcome since.

I'm kind of hoping that some of the internal Civi functions could be called from the migrate hook rather then re-writing it all from scratch. Maybe the API would benefit from having a 'log' option when you action things using it?

Re the relationships - I got the impression that you could do many-to-one from what I read but I haven't got it working myself yet. The link I posted in the body seemed to say you can but it is a bit confusing and I was planning to sit down & work through it soon.

I think I'm going to have to import pledges & I suspect the migrate hook will be the quickest way to get that up & running

Thanks for this useful post, Eileen.

Until the migrate functionality pushes the import implementation down from PHP processing of each row into SQL that operates on all records to be imported, it won't be able to handle large data volumes with any kind of reasonable performance. I can imagine using either the civicm_mapping or something in Drupal's schema to dynamically create the appropriate query/queries.

I really like Dalin's reiteration of the DRY principle. As we move to 4.0 I think that it should be kept front and centre as we consider frameworks and architectures. It's a useful additional way of looking at making the code more comprehensible and modular.

On another note, I believe the wiki does allow attaching documents, but the tiny paperclip on the top left of the page that is used to access them is non-intuitive and should be reworked. Almost everyone looks around at the bottom of the page for attachments, since that is the common standard, and the paperclip is so small it hard to find even when you are looking for it.

Hi,

I'll keep playing with this for a bit longer but I guess I'm looking at this for the same reason that Lobo wrote his own line by line import parsing script a couple of blogs earlier - ie to manipulate the data before importing it. Because otherwise we wind up exporting, grooming in Excel & re-importing which is OK if you only have to do it once. Or else we have to write a script to do it - in which case Migrate seems to reduce the amount of script required.

I mostly develop custom code using the api and running for the shell. It processes enough contacts per second so I'm pretty sure that's over by the end of the coffee break. Even if you had 100'000th contacts, it will be ready by the next morning if you launch it late at night as long as you go through 3 contacts per second (and to be that slow, you need a lot of tests and data massaging, and probably do a lot of redundant lookups for every contact).

Definitely not fast, but good enough for imports that you run mostly one time, isn't it ?

Yes, I don't think scalability is a problem for us. I ran through a very simple contact import for 30,000 contacts fairly easily. I did find that running the import through a browser it stopped every 6000 or so. This is a configurable setting but it means that you don't run into the problems you do with the Civi GUI import where you lose sight of what the outcome is when you over-do it.

The main thing I'm working on is importing pledges & associated contributions so most of my effort has gone into a (partial) api for pledges. I've got it working but I need to play with the relational side of it - ie. I using table wizard relationships / migrate to feed the contact id created in the contact import into the pledge import and the pledge ID into the contribution import (made harder by the need to use a combined key for the relationship between pledges & contributions)

I do note that the point someone made about the Civi import logging is a bit of a red herring as so does the API stuff.