This is the 3rd in my 'Performance for Guppies' blog series. The metaphor is wearing a pretty thin now - but anyone who remembers the history of the api team will know I'm not above flogging a dead metaphor or a bad joke for several years past it's sell by date. Honestly they get really bad and then they become funny again - think John Travolta, William Shatner, Donald Trump's comb-over.
However, I do want to build on my previous blog - and this time more specifically targetting devs and query writers because I found (drum roll) another bad query which suffers from the same (non obvious) problem as the one discussed in the previous blog.
Quick recap on principles from the last 2 blogs
- Queries that don't use indexes will leave you swimming in a fishbowl of interwebby slowness
- There are lots of reasons why queries that you think will use indexes ... don't
- Prepending a wildcard is "LIKE '%one of them, ya know'"
- Using a MYSQL function on an indexed field is another
So, having taken that in you should instantly be able to spot the problem with 'today's query'
UPDATE civicrm_group g SET refresh_date = 20160219031939 WHERE TIMESTAMPDIFF(MINUTE, cache_date, 20160219031439) < 5 AND refresh_date IS NULL
Yep, that's right - it doesn't matter how many indexes you have on those cache_date & refresh_date fields they will never be used because the cache_date field is not being checked directly, it's being checked by a function: TIMESTAMPDIFF.
But it gets worse... if you are selecting data for an UPDATE query as in the above example the mysql site has this to say:
"If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows."
So an UPDATE or DELETE statement that does not use an index locks the table until it has completed. If another query then runs it will report a DEADLOCK resulting in an error. (I think the situation with INSERT may be more nuanced).
This solution to the first error, which resolved the time in php first, seems to have worked, but it is not the only place where that use of TIMESTAMPDIFF appears. Moreover, the indexes the customer has on cache_date & refresh_date fields are ones added to their DB (not currently in core). On the bright side there is precedent for adding indexes early in an upgrade proof way now.
A customer recently did some deadlock diagnosis using the pt-deadlock-logger and identified the query above as well as the one below as causing DEADLOCKS (resulting in user form submissions failing with a nasty error).
DELETE FROM civicrm_acl_contact_cache WHERE contact_id IN (SELECT id FROM civicrm_contact WHERE is_deleted = 1)
Can you see why?
The key issue here is that is locks the entire table of civicrm_acl_contact_cache as there is no filter on that table & it is an update query. The query itself doesn't seem too bad, but it gets triggered in versions less than 4.7 so often that it trips over itself.
The issue is fixed for 4.7.
There were a couple more queries identified, but in the spirit of true guppy training I need you to forget these lessons before I move on to the next one.
I'm loving where you're going with the purge of bad queries! Thanks :)
I'm also wondering - with all this optimising of SQL in our codebase, should we be asking: Why's there so much SQL in our codebase?
Contrasting the query builder approaches of other projects, could we be generating better SQL? Along with this optimisation, and intermittent "whoops SQLi" fixes ... what if new CiviCRM developers had an interface which encouraged less the use of TIMESTAMPDIFF() and friends? No small undertaking of course.
Do we have a metaphor for times when SQL trickery is really called for? Is Crazy Frog the right era?
I guess if you look at the TIMESTAMPDIFF instances you could say
- a bunch of them happen on reports
- they simplify the php in that context (it becomes part of the report metadata)
- you can't always say anything in particular is bad - because depending on the rest of the query it might come out in the wash
- query builders are inherently hard to write and prone to suboptimal queries
All of which adds up to
"I don't know if ripping through them & replacing them would improve things, or whether it is just a selective chipping away"
The query builder for the api is getting better and better. IMO as much as 50% of our ad-hoc queries could be swapped now for api calls. This is what I did recently for Cases (cleaned up a lot of CiviCase code to use the api.get instead of random SELECT queries) which has the added benefit of consistently enforcing ACLs.
The problem for me is that if I write a bad query I'll never know because with a db of 202 contacts it will run just fine.
However, if the keyboard were to administer a small electric shock every time my code runs an unindexed query, that would help me learn! Or maybe there's something else we can do to help devs write better queries?
Yeah - that was the thinking behind starting a blog series. Some of the queries that cause problems are not obviously bad - e.g the TIMESTAMPDIFF one doesn't leap out - and clearly the indexes in the dedupe (per the last blog) are being added in the hope they work - but they don't.
So, starting to push information about them is my current plan - although it anyone discovers a way to pass an electric shock through github I'm all in. I could switch the theme to electric eels
almost filled up the disk to the point of taking the server down, when it logged every single check in full to syslog (not what I was after) as well as logging deadlocks to its output table.