If you are as guppy-minded as I am then slow page loads can send you into a bit of a spin. While that little icon whirs I start to engage in some checking behaviour (facebook, email, IRC), which leaves me with enough to ponder that I go to grab a cup of tea. Coming back I wonder where I got to, click refresh and the cycle starts again.
I've been looking into performance issues for a few organisations lately and I want to share some of what I have found in the process in what I hope will be a series of blogs. In this first one my goal is to explain one thing that can slow queries down ... to non-technical readers. There are also some things you can look at doing on your own site with no technical help.
The average size of database a CiviCRM core developer works with is ... exactly 203 contacts (which is co-incidentally the sample data set you get when you install a dev environment). Sometimes things are fast on these sites but slow on much bigger sites, which means performance issues aren't always obvious to developers.
However, smaller sites are not always faster than large ones. In particular the quicksearch query (the search box in the top left hand corner) can run faster on a largish site than a much smaller site. Consider looking for a buzz lightyear toy in Walmart or a five-year-old's bedroom. Which is quicker? (If you said a child's bedroom you don't have a small child or you shouldn't be allowed one). So why is Walmart quicker? The things in Walmart are ordered in a more predictable way than 'order they were played with'.
Walmart is still hardly an ideal example of 'organised for finding'. They don't want you getting out too quickly. A better example is the local library where books (fiction at least) are organised by the surname of the author. This is called cataloging or indexing and it's pretty much what databases do too. Databases can index one or many fields. The more indexing you do the slower it is to save data - ie. it's much quicker to open the door to your kids bedroom and shove things in than it is to put them away in an orderly fashion. However, I've never seen evidence of speed issues in CiviCRM being due to write speed - it's usually getting data out that is slow.
Generally CiviCRM does use indexing - but there are various queries that can prevent it from using indexes. One of them is wildcards. A wildcard is a character that means 'any'. If I look for 'Fred' then I will find 'Fred'. If I look for '%Fred%' I will also find 'Wilfred' and 'Frederick'. That can be helpful - but there is a cost. Imaging going into the library and saying 'I am looking for a book by an author with 'Fred' in their name somewhere. All that careful filing is wasted because you still have to start with 'aa' and keep looking until you hit 'zz'.
In my database of 203 contacts it takes almost no time to search for 'fred' with or without the wildcard. However, when I switch to a database with around 300,000 people I see different results. Searching for 'fred' or 'fred%' takes around 1/3 of a second whereas searching for '%fred' takes closer to a second. A second might not be long but if there is more than one field taking a second it starts to add up and the guppy-spin kicks in.
There are all sorts of things that can affect how an individual database performs - but the fact remains - an index only works up until the first wildcard. If I search for 'fred%' then I can still go to a pretty specific part of the library to look. If I look for 'f%red' the I have to look through all records starting with F but it's not as bad as '%fred%'.
By default when you search for name in CiviCRM it will search for '%fred%'. However, you can change that. If you go to Administer >> Customise Data and screens >> Search preferences there are some options you can tweak. The first one is 'Automatic Wildcard' - if you disable this then it will search for 'fred%' - although you can enter extra wildcards in the search fields if you need them. The second setting on that page 'include email' is also important in 4.6. On the 300k site mentioned above the quicksearch was taking 3 seconds to return a result. By changing those first 2 settings we got it down to 0.2 second.
I'm not going to go into the second setting 'include email' in this blog (but hope to cover the issues in a later blog) - but in 4.6 it affects quicksearch speed and in 4.7 it doesn't - or not much (see CRM-12326). Also note that changing the include email setting will leave you with slightly confusing field labels until you are able to put 4.6.11 on your site (CRM-17537).
I have tried to direct this to non-techies but I have a last techie directed instruction before I finish. All developers - please go and take a look at the bedroom of a small child and fix that image in your memory and refer to it whenever adding wildcards to a query. Ask yourself if you really do need to go there.