Wednesday, September 28, 2011 - 10:00
Written by

As a freelancer, here are a few notes and an overview on how I solved a client's contacts consolidation when the data came from external sources:

My Use Case
- Client had ongoing events and conferences over several years which meant there were databases and mailing lists all over the place (many of which had crossover).
- I set up ongoing JSON exports of the contact data from the external sites, which could be queried by date range to get the latest changes.
- I had a contact consolidation script on the CiviCRM end, that took these JSON exports, and loaded them into Civ via the API.
- About 5 different source sites created about 1200 contacts

A Few Challenges
- The documentation can be quite scattered and it was difficult sometimes to find the right versions. I really pity whoever has to deal with this challenge though!
- It took me much longer than it should have to learn how to activate the API from a stand alone drupal PHP script. The key was simply:

Import Process
Since I wanted my solution to run as automatic and intelligently as possible, I wrote a lot of custom code to identify duplicates and new contacts. I side stepped the DeDupe engine. My process was something like this:

1) Data Cleaning
- I had to clean up various fields like email addresses with spaces and other characters.
- Country / Province matching was a bit of a challenge. The incoming data would have "Ontario" and "Canada", and CiviCRM wanted the IDs from the database. So I had to do textual matching by either name or abbreviation. I also used a bit of fuzzy matching with levenshtein distance to help with typos like "Ontorio" matching "Ontario". This worked pretty well actually and went a long ways for the data I had.

2) Check if contact already exists
- Use educated guesses to determine if it’s the same. Ie. phone, address, or email addresses are the same, although sometimes multiple contacts use the same email (husband/wife), which I counted as two contacts.
- Deal with name variations – ie. Bob vs Robert - I had a small database of common name synonyms. So if you had Bob Jones, it would match Robert Jones, but probably only if they were in the same province at least. Other factors were weighed in.

3) Merging/Incoming Data
- If newer contacts were found in the external sites, they would overwrite some of the contact's data, but only if no manual updates were made to that contact within CiviCRM. Instead, I planned to log the update for later manual review.
- Each import made a "note" entry, which logged most of the raw incoming data at the time of import, whether or not it got merged in. This turns out to be handy in seeing where exactly a contact came from, and it provides a way to version the data in a sense (see old phone #s, addresses, etc).
- I made static groups for each external site, and had them by year. So for example, a continuing education site (ongoing) had groups such as: CE_2009, CE_2010, CE_2011. The external sites dictated the name of the group for a contact in the incoming JSON feed, which was automatically created from the API if it didn't yet exist. This seemed to be a decent way to partition the contacts for mailings, etc, and to quickly understand where they came from.

My thoughts on CiviCRM in general as it pertains to Drupal:

About CiviCRM (4 with Drupal 7)
- A CRM Less focused on commercial companies (think Salesforce) and more on NGO’s and non profits (think Raiser’s Edge from BlackBaud)
- Primary functionality revolves around contact consolidation and management, and then loads of tracking from when you phone/email someone, to whether or not they're still alive, etc.
- Loads of additional modules and features, especially for NGO's fund raisers, etc.
- CiviCRM works with either Drupal or Joomla, so it is quite abstracted away in a lot of ways, but it does integrate with Drupal well in a lot cases such as the user base and permissions system. It does have that "bolted on" feeling though.
- It uses Smarty for its templating system and doesn’t talk to Drupal’s theme layer very well, so don't expect to customize the look/feel too much (not that you necessarily need to) without working in Smarty.
- It maintains a separate MySQL database (or you at least should) from Drupal’s. Contacts are separate from your Drupal users (a bit fuzzy on these details though).
- It has loads upon loads of features, which is great, but I would recommend it for projects that need what it already does, as opposed to trying to customize it too much.
- Has a well built API v3 for tapping into CiviCRM entities like Contact, Group, Tag, Event, etc…


It'd be great if you could share some of your code - that way, someone could add bits of it into CiviCRM core, which would benefit from doing what you describe (eg under 2).