Import code cleanup in 5.51

Published
2022-06-14 14:04
Written by

The upcoming 5.51 release includes a significant cleanup of the import code. Anyone who has ever delved into this code will know that a four-hour deep dive often left you re-surfacing none-the-wiser. There is an ongoing cost to having code like this but - addressing it required a significant amount of change with an associated risk of regression. Hence I am putting out the request for as many people as possible to download the 5.51 rc and specifically test the import code.

What has changed

 Here are some key highlights of what’s changed:

  • Prior to the change only the contact import imported the csv to an import table (a mysql table in CiviCRM). That code is now used for all imports and the import table now persists after the import has completed. As of writing, all import tables are dropped on upgrade or,  if they are more than two days old, they are dropped on cache clearing. The results of the import are now downloaded directly from this table (not csvs on disk) and can be revisited (via the Monitoring page) while the table still exists. This is more secure and can be useful for debugging or audit. Although not currently exposed, this table also holds the created entity’s  ID.
  • All imports are now tracked through a UserJob entity - which contains the submitted details and how to find the import table. This allows us to stop holding all the information in a complicated form session structure and also allows the import to be processed away from the user’s browser.
  • Use of queueing. The import is now processed as a Queue. For normal users this means the appearance will be like the upgrade queue and it will be less susceptible to time outs. For sysadmins this creates the possibility of processing the queue in the back ground via cron jobs. Greenpeace Germany sponsored improvements in this area recently and we are most of the way to making it possible to leverage that for imports
  •  The final ‘Summary’ page can be thought of as a ‘Monitoring page’ - if the queue is processing in the background it will update on refresh. This can be done by the creator of the import job or a different user with the ‘administer queues’ permission (the user_job_id is in the url)
  • Saved mappings are stored using the field names rather than labels in the `civicrm_mapping_field table` - making them easier to handle in code & less likely to stop working (eg. if a field is renamed)

For developers, you can find a more detailed run-down in the developer docs.

Bugs fixed as part of this

Unresolved

Affected extensions

Due to the insanity of the import code not many extensions interacted with it. Notable exceptions are the csvImporter and advImport. These will both require a new 5.51 release which is underway. Going forward there is a need to define the interfaces for interacting. I have some thoughts but this issue is mostly a placeholder for the discussion right now.

Why clean the code up - why not leap as an extension

There is a significant ongoing cost to having old partly-broken code in core. Everytime a developer tries to dig in they waste many hours tracing through it. When a user logs an issue it goes in the too-hard basket - the place where the list of bugs above where before I tackled the cleanup. For example two issues were logged within an hour of forking - on the new code I was able to fix them with a unit test within 40 minutes - whereas it could have been a day’s work before. In addition much of the bad code is deep within core - reaching into the Contact.create function and other shared code - so the cost is not only with the import. What I learnt from the flexmailer experience is that leaping by extension on bad code that we are stuck with in core leaves you with just as much support to do on the original code PLUS support on the new code PLUS support to figure out which is in use. Leaping by extension is great for new functionality but it isn’t a full replacement for getting rid of technical debt.

Why the big cleanup in a single release?

One thing we learnt through the token clean up is that it is a lot easier to get people to rc-test code changes if the changes in a particular area are concentrated in a particular release. With the import code a LOT of tests have been added but that doesn’t fully substitute people getting stuck in and testing the rc. Cramming a lot of change in a single release increases the risk of breakage in that specific release - but it also increases the chance of the breakage being picked up quickly and fixed. In addition it means we are fixing it on code that should not be much changed from 5.51 compared to future code - making porting easier

Thank you thank you thank you

I want to acknowledge that a number of people have helped out here 

  • Darrick Servis (Davis Media) has done a tonne of QA, writing tests (now and in the past), explaining complicated issues and fixing bugs
  • Monish Deb (JMA Consulting) - my review hero
  • Aidan Saunders (Squiffle consulting) - review and bug-fixes
  • Jon G (Megaphone Tech) - for QA and for his past self having left tests to support now-me
  • Dave J (Circle Interactive)- rc testing
  • Dave D (demeritcowboy) - for going there on the date formatting with me and also for the past-life tests, review and that eagle eye.
  • Jamie McLelland (PTP) - for writing tests in the past that made this possible now. Also Seamus Lee  (JMA Consulting) and Justin Freemen (Agileware) Alok Patel. Alok is formerly of Agileware and used his own name in the tests he wrote. The joys of copy & paste means he has since multiplied throughout tests :-)
  • Peter Davis (Fuzion) & @jhungerford - testing
  • Tim Otten - (Core team) yep all that queue stuff & Coleman for review
  • Greenpeace Central & Eastern Europe - sponsoring Queue improvements
  • Whoever I’ve missed - you know who you are but right at this moment in my mind is failing me

Bugs I’m still re-testing/ triaging

541 Custom field import issues around subtypes - fixed?

1626 View only custom fields import bug

1771 Multi field import custom data bug (fixed?)

1832 Activity date import issue

1848 Updates to contact subtypes replaces rather than adds

2222 Deleted contacts matched by external identifier

Filed under

Comments

Great to see this as I've done so many imports. Where do we report bugs?