Skip to main content

GROWING AND SUSTAINING RELATIONSHIPS

GROWING AND SUSTAINING RELATIONSHIPS
Close
Tyler Barnes

Implementor, Consultant, Designer, Trainer

Elev8brand

http://elev8brand.com

CiviCRM is an solid and feature rich web based CRM that supports an organization or non-profit's ability to service its members. In a recent implementation for client they are using CiviCRM as an association management tool for memberships and event registrations. Associations are a huge area for growth in the CiviCRM market, and the ability for an organization to own and control their own data via an open source platform is a liberating experience. Coupled with Drupal and/or Wordpress makes CiviCRM and ideal combination for ANY non-profit or association.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Lynna Landstreet

Web developer

Freeform Solutions

http://www.freeform.ca

Freeform Solutions uses CiviCRM to help the non-profit organizations we develop sites for to manage information about their members, volunteers, activists, donors, employees and other contacts, and to handle donations, correspondence, mailings and more. We support the CiviCRM community by contributing documentation, patches, modules and code, and are a silver sponsor of CiviCon 2013.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Michal Mach

Core Team Member, Developer, Implementor

CiviCRM, Caltha

http://civicrm.org

I've always been passionate about what non-profits and advocacy groups can achieve using technology. For me, CiviCRM shows an essential example of how non-profit and technology worlds can come together to provide real change - working as community, creating value for yourself, but also for others in non-profit sector.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Karin Gerritsen

Developer

Semper IT Inc.

http://semper-it.com

I help non-profit organizations optimize workflows by creating interactive Drupal/CiviCRM websites for them.

GROWING AND SUSTAINING RELATIONSHIPS
Close
David Moreton

Consultant, Implementor and End-user

Circle Interactive

http://www.civisites.com

We help many not for profits implement CiviCRM through consultancy, training, configuration and custom development. Many of them come from a painful world of old Access databases, multiple spreadsheets and even paper. It's really satisfying to
help people move on with a system that's so much in tune with their own ethics of sharing and collaboration. We also 'eat our own dog food' and use Civi in-house for our client records because we love the flexibility and control it gives us.

For us it's important to share code and advice with other members of the community when we can because we know we get it back in help at other times. The community really is awesome and one of the friendliest and undaunting I've come across. We appreciate the huge value of the software to us and our clients so we try to contribute back and make it even better.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Allen Gunn

Ally, FanBoy

Aspiration

http://aspirationtech.org/

By giving the nonprofit sector a values-driven, free/open source solution for CRM needs!

GROWING AND SUSTAINING RELATIONSHIPS
Close
Michael Daryabeygi

Implementor

Ginkgo Street Labs

http://ginkgostreet.com

CiviCRM enables me to empower my clients with a database that suits their unique needs.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Allen Shaw

DEVELOPER

EMPHANOS

http://emphanos.com

I'm quite impressed with the responsiveness of the CiviCRM community, both from the core developers and many experienced users who have quickly provided answers and ideas in areas where I just needed that extra insight, or where we needed to do something totally new. After several years working with open source software, I'm finding the CiviCRM community to be the most responsive and helpful I've seen.

We make CiviCRM one of our primary offerings because it just provides so much right out of the box that our clients need, without a line of custom code. And when we need to extend it for the clients' unique needs, the APIs and programming hooks let us add in features that would be impossible in some other systems. This means we can provide great value to our clients with quick turnaround times and reasonable budgets, which is great for our clients and for us.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Chris Ruhlen

Developer

Donor Depot

http://www.donordepot.com

CiviCRM community is a very dedicated group of professionals who deeply care about causes that care about the causes of charitable organizations.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Margaret Thom

User, Administrator

Biodynamic Farming & Gardening Association

http://www.biodynamics.com

Online and offline membership sign-up and renewal,donations and event registration, integrated with our Drupal website. Directory for members, group and bulk emails, renewal reminder emails. Exciting to have complex conference registration within our CRM. Impressed with the Civi community and ongoing development of new features and improvements.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Leena Nangia

Consultant

nfpservices

http://www.nfpservices.co.uk/

We use CiviCRM for our own business functions. Nfpservices participate in the development of CiviCRM and contribute enhanced functionality to the community.

GROWING AND SUSTAINING RELATIONSHIPS
Close
Andrew Hunt

Implementor, Developer

AGH Strategies

http://aghstrategies.com

CiviCRM allows our clients to have a robust tool for tracking and engaging their supporters that can grow with them. I began as an end user, and now I work with CiviCRM full-time.

LOGIN | REGISTER
  • Create new account
  • Request new password

Search form

  • BLOG
  • DEMO
  • Find An Expert
  • NEED HELP
  • SUPPORT US
  • DEVELOPER RESOURCES
CiviCRM Community Site logo CiviCRM Community Site
  • WHAT IS CIVICRM
    • Community
    • Case Studies
    • Experts
    • Contributors
    • Core Team
    • Licensing
    • Contact Us
  • WILL CIVICRM MEET YOUR NEEDS?
    • Contacts
    • Contributions
    • Communications
    • Peer-To-Peer Fundraisers
    • Advocacy Campaigns
    • Events
    • Members
    • Reports
    • Case Management
  • GET STARTED
    • Evaluate Your CRM Needs
    • Evaluate CiviCRM Features
    • Read Books
    • Contact an Ambassador
    • Demo CiviCRM
    • Download CiviCRM
    • Download Extensions
    • Find An Expert
  • PARTICIPATE
    • Join the community
    • Make it happen
    • Support CiviCRM
    • Meet ups
    • Document CiviCRM
    • Translate CiviCRM
    • Developer resources

You are here

Home » Blogs » colemanw's blog

Blog

  • API
  • Architecture Series
  • CiviCampaign
  • CiviCase
  • CiviCon
  • CiviContribute
  • CiviCRM
  • CiviCRM v4.1
  • CiviEvent
  • CiviMail
  • CiviMember
  • CiviMobile
  • CiviPledge
  • CiviReport
  • Documentation
  • Drupal
  • Extensions
  • Finance and Accounting
  • Interface Design and Layout Standards
  • Internationalization and Localization
  • Joomla
  • Make it happen
  • Marketing and Promotion
  • Meetups
  • Older Versions
  • Release
  • Schools
  • Solutions (case studies and user stories)
  • Sprints
  • Teams
  • Training
  • v1.6
  • v1.7
  • v1.8
  • v1.9
  • v2.0
  • v2.1
  • v2.2
  • v2.3
  • v3.0
  • v3.1
  • v3.2
  • v3.3
  • v3.4 and v4.0
  • v4.2
  • v4.3
  • WordPress

Keeping your DB clean: how about a hook_civicrm_dedupe?

Submitted by colemanw on September 4, 2011 - 14:50

Like many others who have no doubt gone before me, I found myself spending a large chunk of time this week finding and merging duplicate contacts. Civi's deduping UI has gotten a lot better lately, and I love having the ability to go through it more quickly than before. But deduping is only as good as the queries it runs, and of course prevention is better than cleanup.

Here are some observations of how all those dupes in my DB got created:

  • People treat nicknames and first names interchangeably -- Bob might call himself Robert on one form and Bob on another, and not even notice he's doing it.
  • People change email addresses like they change their shoes!
  • Ditto for phone numbers and addresses!
  • Phone number is currently just a textfield, so deduping on phone is almost useless (555.5555 doesn't match 555-5555 or 5555555 or 555 5555 and if you include area codes there's even more permutations)
  • A slightly mistyped name (it's amazing how many people can't spell their own name!) will result in a dupe that might never be discovered, since even your fuzzy rule won't catch it!
  • Setting your strict rule strict enough that you don't get any false positives results in way too many false negatives (creating loads of dupes in the DB)
  • Setting your fuzzy rule fuzzy enough to clean up all those dupes means sifting through thousands of "possible dupes" who are unquestionably different people.

In order to solve these and other problems, our dedupe queries need to be able to:

  • Use OR logic (match nick_name to first_name and vice-versa)
  • Use SOUNDS LIKE or wildcards to catch misspellings
  • Intelligently choose fields to match based on what data is available (the current "weight threshold" system is too simplistic)
  • Group matching data by category (email, phone, and street address all help to locate a person, but none of those things uniquely identify them -- members of households or businesses frequently share all three)
  • Strip irrelevant characters such as phone number punctuation

For example:

Because we have a few custom modules running for our school to register people for programs and such, we save a lot of contacts via API. I've created my own wrapper that handles matching and saving contacts, and before it passes data to the Civi API, it first runs a query to check to see if that contact already exists in the database. The logic for the query goes like this:

IF first_name OR nick_name match (or cross-match)
AND last_name matches
AND email OR street address OR phone number match

Then we have a match!

It's a great query, and almost guarantees no false positives or negatives. You could use it too, or write a better one (since it doesn't tackle the misspelling issue). You or I could also write a great fuzzy query that really nails those dupes without making us comb through thousands of false-positives. But in order for that to happen outside my own little custom API function, we need...

hook_civicrm_dedupe()

Lobo and I were discussing this on IRC the other day, and came up with a couple possibilities. See what you think:

/**
* The "hijack and replace query" method
*/
hook_civicrm_dedupe($contactType, $rule, &$query, $groupID=NULL, $params=NULL) {
  // In this version, the dedupe query has been built and is about to be run
  // $contactType might be Individual, Household, etc.
  // $rule array or object containing the rule being invoked (rule ID, strict or fuzzy, name, label, is_default, etc)
  // $query is the SQL string that we can modify or overwrite
  // $groupID limits the search to a particular group
  // $params will be empty for whole-database deduping, or will contain an array of user-entered data from a webform if we are matching a single contact (i.e. from a contribution page).

  return 'SELECT some other query';

  // CiviCRM will run the new query instead of the one it was about to run.

}

Another possibility would be to for the hook to let you register a new dedupe rule programatically, rather than just hijaking an existing one. (that hook would be called when giving the user-admin dedupe options, so it would show up as a rule that could be made, say, the site's default strict rule.

/**
* The "register query-builder" method
*/
hook_civicrm_dedupe() {
  // existing dedupe rules have numeric id's, so we should probably use non-numeric keys to avoid conflicts

  return array(
    'my_great_rule' => array(
      'level' => 'strict',
      'contact_type' => 'Individual',
      'name' => 'One Rule to Rule them All',
      'callback' => 'my_great_rule_query_builder',
      'file' => 'my_module_dedupe.inc',
    ),
    'my_other_rule' => array(
      //etc...
    ),
  );

  // CiviCRM will display these new rules as options which can be used or made default.
  // When actually running the query from this rule, it will load the specified include file and call the specified function, much in the same way as the first example (but without the $query param), and expect a query string in return.

}

I think this may be be preferable for a number of reasons, namely portability and efficiency. It's more portable because you can define the rule, package it in a module, and it is not at all site-specific. It's also more efficient because we're not asking CiviCRM to build a query only to have it get discarded and overwritten by the hijack-method hook.

In either case, we need to be aware of the security implications of this hook, and avoid making it too easy for newbies to directly concat user-entered data into a query.

  • colemanw's blog
  • Log in or register to post comments

Comments

already exists

Permalink Submitted by lcdweb on September 4, 2011 - 15:22

this already exists (though slightly different than your suggestion:

* This hook allows modification of the queries constructed from dupe rules.
* @param string $obj object of rulegroup class
* @param string $type type of queries e.g table / threshold
* @param array $query set of queries
*
* @access public
*/
static function dupeQuery( $obj, $type, &$query ) {...

 

we use it extensively, and it works much like your initial suggestion -- the hook receives the rules defined in a rule group, and allows you to reconstruct/enhance them.

the one weakness is as you describe toward the end -- it uses the "hijack" method, which means there needs to be a rule group with at least one rule defined in the system. for our purposes, we will create a basic group that has a single rule with weight 1 and threshold 1. those values can be set within the hook, so they only exist in the db for the sake of registering the rule in the system so it can be modified in the hook.

we're doing some nice algorithms in the hook, such as normalizing the street address (strip spaces, remove ordinals, condense street name suffix (street > st, road > rd, etc), removed punctuation, etc.) -- which has helped up pick up quite a few more dupes.

there are a couple gotchas with the hook, such as making sure its not run on user account creation, and you need to construct the queries differerntly to account for when the rule is run during import vs. when the rule is run from the interface. but it gives you the access you're looking for.

  • Log in or register to post comments

sample code

Permalink Submitted by lcdweb on September 4, 2011 - 15:44

here is what we are doing with that hook:

https://github.com/nysenatecio/Bluebird-CRM/blob/master/modules/nyss_dedupe/nyss_dedupe.module

  • Log in or register to post comments

Great idea

Permalink Submitted by xavier on September 5, 2011 - 00:12

I like the second option (being able to introduce new rules) better as well, mostly because it means being able to choose it explicitely (eg. in the import).

 

Modifying the query for dedupe is quite a hidden feature where the rule as displayed isn't the one applied (eg. you can change the threshold and add more fields matching without any impact). This is likely to confuse the users, isn't it?

 

And it seems that (according to brian's example) that you match on the dedupe name to know when to apply. using as a key something users can change is recipes for interesting issues ;)

 

Otherwise, the module introduces nice features (eg st == street), looks great. 

X+

  • Log in or register to post comments

i think the id is passed

Permalink Submitted by lcdweb on September 5, 2011 - 09:55

i think the rule group ID is passed in the object. and we should probably alter to use the id at some point in our implementation. 

i agree that it would be useful to have the flexibility to dynamically create a rule group, and perhaps change the interface so the rule group is not editable. but maybe that should be handled with a new hook. so one hook can create the rule group, and a second alters the queries. the only gotcha with doing that is the dedupe code handles three scenarios right now -- user account creation, import, and find dupes interface -- each of which constructs queries in slightly different ways because of the different data being compared. the fact that you currently have to construct a dummy group/rule in the system before modifying it with the hook, provides something of a fallback should you not account for all three scenarios.

another comment about our implementation --

the way the dedupe works on import (which was our primary usage), it compares the incoming file row (converted to an array) with the table in the db. implementing data normalization to the array is fairly straightfoward. but is a bit of a pain with the mysql side. we accomplished with a cyclical REPLACE on the fields in question, which is fairly inefficient (though functional).

we've recently installed a regex library into mysql, with the goal of improving some of our algorithms further -- both in terms of flexibility and efficiency. might be nice to figure out a better way to structure some of the dedupe code so that we have more flexibility to perform the data conversion strictly in PHP.

  • Log in or register to post comments

Great Stuff

Permalink Submitted by colemanw on September 5, 2011 - 10:34

lcdweb: this is great stuff, I really appreciate you sharing your code. I'm going to give implimenting the existing hook a try before spouting off any more about creating a new one. I do agree that having a query already constructed is nice so your hook has the option to do nothing.

I also agree about the pain of replacing strings in mysql. My phone number stripping query looks like this: "LEFT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phone.phone, '.', ''), ' ', ''), '-', ''), '(', ''), ')', ''), 10) = '%s'" yep, it's ugly. Let me know how your regex experiments go.

I wonder why Lobo didn't mention the existing hook when he told me to go write a blog article about this...

  • Log in or register to post comments

Lobo's getting old ..

Permalink Submitted by lobo on September 5, 2011 - 15:46

and his brain seems to have developed a few too many memory leaks :(

sorry about making u do the work, but on the flip side a good discussion was had by all!

lobo

  • Log in or register to post comments

And if we could end up with this hook documented in the wiki

Permalink Submitted by xavier on September 5, 2011 - 16:26

That would be another nice outcome of this discussion.

 

Might even possibly somewhat balance the terrible discovery that lobo isn't omniscient (or that marmite is his kryptonite ;)

 

X+

  • Log in or register to post comments

Ahh, that's why

Permalink Submitted by colemanw on September 5, 2011 - 20:11

Ahh, that's why I didn't know about it. OK, I don't feel so foolish now.

As soon as I get a better understanding of it, I'll be happy to add this hook to the wiki documentation.

Incidentally, are there any other useful hooks out there that are "unlisted" on the wiki?

  • Log in or register to post comments

CIVICRM


GROWING AND SUSTAINING RELATIONSHIPS

WHAT IS CIVICRM
  • Community
  • Case Studies
  • Experts
  • Contributors
  • Core Team
  • Licensing
  • Contact Us
WILL CIVICRM MEET YOUR NEEDS?
  • Contacts
  • Contributions
  • Communications
  • Peer-To-Peer Fundraisers
  • Advocacy Campaigns
  • Events
  • Members
  • Reports
  • Case Management
GET STARTED
  • Evaluate Your CRM Needs
  • Evaluate CiviCRM Features
  • Read Books
  • Contact an Ambassador
  • Demo CiviCRM
  • Download CiviCRM
  • Download Extensions
  • Find An Expert
PARTICIPATE
  • Join the CiviCRM Community
  • Read Our Blog
  • Community Forum
  • Attend a Training or Meetup
  • Make It Happen
  • Become A CiviCRM Developer
  • Issue Tracker
  • Help with Documentation
  • Translate