New dedupe for CiviCRM 2.1 – check it out on sandbox

Published
2008-05-26 01:46
Written by
The new dedupe engine and UI landed on trunk (development part of our code repository) last week, and we’d be more than happy if you gave it a try on our CiviCRM 2.1 sandbox and let us know how it works for you. The new dedupe, besides the engine changes described earlier, sports a new user interface. Navigate to Administer CiviCRM → Find and Merge Duplicate Contacts and check out the new admin screens. Since 2.1, you are not limited to editing the three rule groups (one per contact type), but can create your own. The change that lets you use various dedupe rule groups for a given contact type without having to redefine them is accompanied by two new properties – ‘defaultness’ and level, explained below. Another change in CiviCRM 2.1 is that the dedupe engine is used for contact matching through the whole application. For this, the two ‘fuzziness’ levels of dedupe rule groups were added: contact creation/edit uses the default fuzzy rule group (for the given contact type), while import, profile creation/edit and event/contribution registration use the default strict rule. The last (but definitely not least) important thing about the new dedupe engine is the performance improvement. Thanks to the users who generously donated their databases for profiling, we’re happy to announce that, given adequate hardware (and, perhaps, a custom database index, if a dedupe rule is based on some not-indexed-by-default column), the new engine should ‘simply work’. With my laptop’s parameters (Core 2 Duo 1.8 GHz, 3 GiB of RAM) and a simulated cold MySQL start, the parametrised queries (used for contact matching) took around 0.01s even for a 62k contact database. The full dedupe scan with the default fuzzy rule for individuals (first name, last name and email matching) from a cold MySQL start takes about 2.5s on a 11k contact database, 7.5s and 27s on two 18k contact databases and 68s on the 62k contact database mentioned earlier. Thanks to the results being cached, we believe the new engine is finally useful also for the larger contact databases. Once again we invite you to try it and give us feedback on what you think.

Comments

Looks great. Very speedy.
One bug -- after merging two records, it dumps you back into the site homepage instead of returning you to the list of contacts to be merged.

Jeff Porter --> Foundation for Prader-Willi Research (www.fpwr.org)

Looks really good - great perf improvement over 2.0. Question, will you be able to dedup across contact types (e.g. Households --> Individuals) moving fwd (perhaps in 2.2?) based on this new approach?

We have not started thinking about whats part of 2.2 etc. I suspect a fair amount will be based on user feedback from 2.0/2.1.

Any specific use cases where cross contact type merging is needed?

lobo