Full-text Search for CiviCRM: Initial Thoughts ...

2006-11-17 12:47
Written by

It is important for CiviCRM to have a full fledged un-structured search engine in addition to the current structured query. I don't think MySQL full text searching (MFTS) is a good model for a couple of reasons. Firstly MFTS is restricted to myisam tables and CiviCRM uses innodb tables. Secondly MFTS is still a table level search and i don't think it can handle hierarchical data. CiviCRM contacts are hierarchical data sets.

Would be great to integrate something like Lucene into CiviCRM. A potential work flow could be as follows:

1. Publish an xml specification of the CiviCRM data model. We have done a fair amount of this work for the Branner project. We could extend and automate this quite nicely using our code generator. Also xml fits quite nicely since we can represent hierarchical data

2. Extend the logging functionality so we are aware of all modifications to any part of a contact record. Currently we are restricted to changes to the civicrm_contact, civicrm_individual, civicrm_household, civicrm_organization records in our logging framework. We need to decide what tables are part of the "contact" data and make appropriate modifications (e.g. civicrm_location is directly connected to a contact while civicrm_email is indirectly connected via civicrm_location, so we need a fairly efficient system to record such changes as contact level changes).

3. On a periodic basis (triggered by a cron job) incrementally update the xml entries of all the contacts that have been modified since the last cron and reset their status

4. Incorporate these new changes into lucene's search index

5. Link up CiviCRM search to Lucene search. we can use the Zend framework port of Lucene to PHP to accomplish this

6. Give users some mechanism to see where the contact record matched the search criteria (potentially display the contact's xml definition?)

Please do send us email / get in touch if you have a better understanding of this issue and can help us design / develop this further.

Filed under