Amending the Find and Merge Duplicate Contacts page

2012-09-23 08:19
Written by

As part of the Bristol Civi Sprint a proposed new layout has been suggested for the existing Find and Merge Duplicates page. The page is used to add / edit Duplicate Matching Rules for the individual, organisation and household contact types. This wouldn't involve changes to the way the Merge Rules work but the changes will make the page easier to use and understand.


This would include changing the use of the confusing terms 'Strict' and 'Fuzzy' to 'Front End' and 'Back End' respectively.  On-screen help text will explain what the terms mean and how to use the page features. The names of the Rules would also be altered so they reflect the fields used to identify a matching contact.


A mockup of the proposed changes can be found here;


Nice proposal, Oliver.

I'm wondering, however, whether the terms "front end" and "back end" are easier to understand. When it comes to the "default strict" and "default fuzzy" rules, then "front end" and "back end" are correct, but ...

  • Importing a contact is not a "front end" activity so using that label for a "strict" rule doesn't cover that case well.
  • Apart from the default rules, I'm not sure what value the distinction between "strict/front" and "fuzzy/back" has. Eg, when I'm importing contacts, I can use either a "strict" or a "fuzzy" rule.
  • Unclear why some rules are called "optimized" in your markup.
  • I note that some rules such as "Household Name and Email" are implemented both as "strict/front" and "fuzzy/back".
  • For me, there's still a semantic gap between "front end" and "rule used when a contact is created as part of (say) a contribution" that needs explanation. Your onscreen text goes a long way to providing that, but I have a sneaking suspicion I would need to refer to that text often to remind myself of the meaning.

Rather than have the concepts of "fuzzy" and "strict" and "default", could we replace that with 3 classes of rule ...

  • General - most rules are general (equivalent to non-default fuzzy)
  • Unsupervised - the rule used when unsupervised duplicate checking is required (equivalent to default strict)
  • Supervised - the rule used when supervised dupaicate checking is required (equivalent to default fuzzy)

To minimise changes, this could be implemented in the UI only

I like the mockup, and agree it will help organize things better (particularly separating them by contact type). I partially agree with ken, but want to suggest an alternative approach.

Since within each contact type we need to identify one rule that is used for "frontend/unsupervised" matching (event registration/contributions/profiles) and one that is used for "backend/supervised" matching (when the contact edit form is saved) -- why don't we simply define those as checkbox selections on the rule edit form, and then list them in the grid. So rather than have columns for type and default, we have a single column for "special usage," and also allow a single rule to be used for both purposes (with households and orgs, it's not uncommon to have the same rule definition for both -- which currently requires you to create two identical rules and mark one strict-default and one fuzzy-default).

Conceptually, all rule sets are created equal, and we don't really need to classify them as strict/fuzzy/front/back/etc. We just need a way to flag the two rules that have a special purpose within each contact type.

I'll do my best to respond, explain why the Mock Up was laid out as it was and make some amendments to the Proposal. It was clear, when I started looking at this, that complex amendments were beyond the scope of possible changes and would also affect the ability to update from previous versions. So the suggested Proposal changes have been restricted to layout and label changes only.


I'll approach all the points made in turn..

- Import data anomaly; The 'Front End' default rule is selected by default and this is an anomaly (as it is not a Front End action). A number of possible terms were discussed (strict/fuzzy, online/offline, public/admin, front end/back end, attended/unattended, supervised/unsupervised) and it was felt that Front End / Back End made the most sense whilst none of these would be perfect.

- Supervised/Unsupervised;  The terms 'Supervised/Unsupervised' would also be acceptable (as would 'Front End / Back End'). It would be good to get a consensus on this as to which is the clearest?

- “Optimized” in the Rule name text ; The term 'optimized' was used in the proposals Individual out-of-the-box Rule names as all 3 are Reserved rules. A good adjustment might be to change the name text for all 3 from 'optimized' to 'reserved' and then explain why it is reserved when you try to Edit the rule, I'll add this to the Wiki page.

- Repeated Rule Names;  The current data structure means that there has to be 2 Rule types ( 'Front End and 'Back End') with defaults for all 3 main contact types. This is still the case even when both of a contact types default rules use the same fields to identify the duplicates. Because of this some of the Rule names are repeated. Generally though I do think changing the Rule name convention, to reflect the fields that are being used to identify the duplicates, is a step forward.

- Explanatory text on page text; I agree with your opinion, the text probably still isn't quite clear enough. I have amended the proposed text on the Wiki, can you think of anything that will help further clarify the meaning?

- General Rule type; With the current technical setup all Rules have to be either 'Front End' or 'Back End' when created and this is then fixed when the Rule is saved. Because of this it isn't possible in this proposal to have a General Rule type.

- I totally agree with the comments made by lcdweb!!! For this proposal this exactly what we wanted to do. The reason we can't do this is the technical implications of putting this in place as we are restricted to layout and label changes only.


Based on Brian's feedback, we decided to combine the level and default in one column called usage. So, usage can have any one of the 3 values - Unsupervised, Supervised OR General.


Unsupervised and supervised will be the only default rules each for frontend and backend respectively. Rest all the rules that are created will be 'General'. Setting a new rule to 'Unsupervised' will result in un-defaulting the earlier rule that was Unsupervised( same goes for 'Supervised' )  and depreciating the earlier rule's 'Usage' value from Supervised/Unsupervised to General (which is what the current Default functionality does).


I think this approach incorporates Brian's idea and as well as does not involve a lot of code changes while being fairly obvious.

I have updated the mock up here


That looks great and much simpler to use and understand. - Oliver