This is the 3rd in my 'Performance for Guppies' blog series. The metaphor is wearing a pretty thin now - but anyone who remembers the history of the api team will know I'm not above flogging a dead metaphor or a bad joke for several years past it's sell by date. Honestly they get really bad and then they become funny again - think John Travolta, William Shatner, Donald Trump's comb-over.
However, I do want to build on my previous blog - and this time more specifically targetting devs and query writers because I found (drum roll) another bad query which suffers from the same (non obvious) problem as the one discussed in the previous blog.
Quick recap on principles from the last 2 blogs
So, having taken that in you should instantly be able to spot the problem with 'today's query'
UPDATE civicrm_group g SET refresh_date = 20160219031939 WHERE TIMESTAMPDIFF(MINUTE, cache_date, 20160219031439) < 5 AND refresh_date IS NULL
Yep, that's right - it doesn't matter how many indexes you have on those cache_date & refresh_date fields they will never be used because the cache_date field is not being checked directly, it's being checked by a function: TIMESTAMPDIFF.
But it gets worse... if you are selecting data for an UPDATE query as in the above example the mysql site has this to say:
"If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows."
So an UPDATE or DELETE statement that does not use an index locks the table until it has completed. If another query then runs it will report a DEADLOCK resulting in an error. (I think the situation with INSERT may be more nuanced).
This solution to the first error, which resolved the time in php first, seems to have worked, but it is not the only place where that use of TIMESTAMPDIFF appears. Moreover, the indexes the customer has on cache_date & refresh_date fields are ones added to their DB (not currently in core). On the bright side there is precedent for adding indexes early in an upgrade proof way now.
A customer recently did some deadlock diagnosis using the pt-deadlock-logger and identified the query above as well as the one below as causing DEADLOCKS (resulting in user form submissions failing with a nasty error).
DELETE FROM civicrm_acl_contact_cache WHERE contact_id IN (SELECT id FROM civicrm_contact WHERE is_deleted = 1)
Can you see why?
The key issue here is that is locks the entire table of civicrm_acl_contact_cache as there is no filter on that table & it is an update query. The query itself doesn't seem too bad, but it gets triggered in versions less than 4.7 so often that it trips over itself.
The issue is fixed for 4.7.
There were a couple more queries identified, but in the spirit of true guppy training I need you to forget these lessons before I move on to the next one.