Expanding CiviCRM search ...

Published
2007-04-01 16:28
Written by
We made a few major changes to the v1.7 search interface for a big improvement in performance. The first change was to ot use a wildcard for the prefix. Thus when a user searches on NAME, we only search for 'NAME%', in older version we would search for '%NAME%'. This allows mysql to use the index on sort_name and is significantly faster than a full table scan. The second change involved not searching the 'email' table when doing a search on 'name'. This allows us to avoid two very expensive 'LEFT JOIN' sql statement and speeds up search significantly. You can get around the above limitations by manually specifying the prefix. There is also a new text search box for email in advanced search. This is probably not a very good user experience and the community has started bouncing around ideas on potential ways to improve the experience without sacrificing speed. We also want a full text search engine for CiviCRM and some of my earlier thoughts are here. Based on a discussion on the mailing list, I decided to check out what are some technologies we could adopt in the near future into CiviCRM. After a few minutes of searching, I found Solr. Solr is an XML based frontend sitting on top of Lucene. Solr communicates with the rest of the world using HTTP and XML. Thanx to some prior work, we pretty much have a most of the code for spitting out contacts as XML (we use this route to interface with PDFlib). I did go through the Solr tutorial and it seemed fairly easy to add / delete XML documents. Just need to do a bit more research on how to represent hierarchical tables and see the features it offers us. I'm quite excited about this, and is with any new technology will probably spend some time over the next few days figuring out how best to integrate it with CiviCRM. I suspect this will give us a pretty good search solution for minimal effort. Both Solr and Lucene are java based projects so folks will need a J2EE server to get this functionality when we do introduce it. However this fits in nicely with the need for a J2EE server for the forthcoming release of CiviReport. Some useful Lucene and Solr links:
Filed under

Comments

Anonymous (not verified)
2007-04-05 - 17:02

Hi Lobo

This sounds great. One of the great aspects of CiviCRM is the speed at which you guys are building in new features.

From your blog I note that this often involves implementing other projects as part of CiviCRM. The downside of this is that it creates a bunch of dependencies to manage in order to ensure stability/security.

I would be interested in your thoughts on this issue and what the developers do to deal with the risk/maintenance issues created by adding these dependencies?

Thanks for the great project!

I think its a tradeoff between build vs reuse. At our current stage, it does help to reuse as much as possible especially with high quality open source pacakages. We spend a fair amount of time figuring out what packages we can use etc. Yes, this does increase the security burden on folks implementing and deploying CiviCRM and its dependencies.

Hi,

I found the code for this for Civicrm 2.2.9 and before: http://svn.civicrm.org/civicrm/tags/tarballs/2.2.9/tools/solr/.

Is there also code that can be used for Civicrm 3 and any more documentation on how to use it?

Thanks