- Let the "name" field represent sort_name only. Avoid the "LEFT JOIN" with civicrm_email table. This skips the two LEFT JOIN's and makes the first two queries significantly faster (CRM_Core_Config::$includeEmailInSearch)
- For the sort_name,use the clause LIKE "value%" instead of LIKE "%value%". This allows mysql to use an index rather than the sequential scan.
- If the alphabetical pagination display is not very important, suppress it completely (CRM_Core_Config::includeAlphabeticalPager)
- If the ordering is not important, suppress the ORDER BY clause (CRM_Core_Config::$includeOrderByClause)
- If you are running only one civicrm install (i.e. one domain id), suppressing the domain where clause also speeds things up (CRM_Core_Config::includeDomainID)
- For large data sets it is significanly faster to compute the list of contacts you want information on and then use that list to restrict the results. As an example our current query looks like:
SELECT sort_name, first_name, street_address, email, phone FROM civicrm_contact LEFT JOIN civicrm_location LEFT JOIN civicrm_address LEFT JOIN civicrm_email LEFT_JOIN civicrm_phone WHERE COMPLEX_WHERE_CLAUSE ORDER BY sort_name LIMIT 0,50;For large data sets the above query is ridiculously inefficient (since the LEFT JOIN's take forever) A more optimized query is:
SELECT contact_id FROM civicrm_contact WHERE COMPLEX_WHERE_CLAUSE ORDER BY sort_name LIMIT 0,50; SELECT sort_name, first_name, street_address, email, phone FROM civicrm_contact LEFT JOIN civicrm_location LEFT JOIN civicrm_address LEFT JOIN civicrm_email LEFT_JOIN civicrm_phone WHERE contact_id IN ( LIST_GENERATED BY ABOVE QUERY ) ORDER BY sort_nameNote the absence of the LEFT JOIN in the first query (there might be a few depending on the search criteria) and the change in the WHERE clause for the second query. mysql has an incredibly crappy sub-select implementation (for independent sub-select clauses) which are best avoided. If your querying on address/email/state values all at the same time the above might not be very optimal
So my understanding is that you create a complex query and then do variations on that query to retrieve the pager etc.
I think the idea of splitting the initial complex query up is a good start, but IN statements are quite inefficient. Have you thought of creating a temp table with the initial results instead of using a WHERE cid IN (list_of_cids). This might gain some ground. Generally it's more efficient to do your filtering in the FROM statement, rather than the WHERE statement.
My initial reaction is actually to try sub-queries. You mention that you've had bad experiences with that route, but I think in the past you've added the sub-queries in the WHERE statement, not in the FROM statement. From my experience with MS DB engines, sub-queries in the FROM statement is the most efficient way to go for virtually all complex queries. I'm assuming the results would be similar with MySQL, though I haven't tested thoroughly.
The following approach is very inefficient:
SELECT c.contact_id, e.email, p.phone
civicrm_contact AS c
LEFT JOIN civicrm_phone AS p ON contact_id
LEFT JOIN civicrm_email AS e ON contact_id
WHERE contact_id IN (
WHERE email LIKE "johndoe%"
The better way to go is something like:
SELECT c.contact_id, sqry.email, p.phone
civicrm_contact AS c INNER JOIN (
SELECT contact_id, email
WHERE email LIKE "johndoe%"
AS sqry ON contact_id
LEFT JOIN civicrm_phone AS p ON contact_id;
MySQL basically creates a temp table of the sub-query and uses the temp table in the master query. This is naturally better than calling the email table twice. This method gets rid of some left joins too.
In this example you might also try to subquery the phone table to return only the primary phone, then you wouldn't need to use any left joins.
You could probably do your pager in php too. This would mean that you're only using one query.
It has the potential to not work out at all, but it might be something to look into.
We've got about 100K contacts in our database.
Searches were taking 20-100 seconds to complete (basic searches with a
query term and default options).
After reading http://civicrm.org/node/99 , I implemented some of the
suggestions and here's what I found.
All of my times were gathered with a pageload timer in FF and are based
on 2-3 tests of each optimization, each using the same query. Our site
implements eaccelerator opcode cache.
Changes to settings were made in CRM/Core/Config.php
Baseline: With all optimizations off
- 2 queries averaged 95 seconds.
Set $includeEmailInSearch = 0;
- 3 queries averaged 5.3 seconds
- search term still matched on email addresses.
- search for firstname.lastname@example.org returned a match on email (this is most likely because the user did not have a first/last name and hence their sort/display name is their email address)
Set $includeAlphabeticalPager = 0;
- 3 queries averaged 3.7 seconds
Set $includeOrderByClause = 0;
- 3 queries averaged 2.6 seconds
Set $includeDomainID = 0;
- 3 queries averaged 2.4 seconds
Would be awesome if you could run some tests of your suggested queries above vs what our optimized queries are and document the time difference between the two approaches. Will help us significantly and allow us to make a more uniformed decision