Published
2009-01-16 13:38
One of the new features in version 2.2 of CiviCRM (in alpha release as of this posting) is a new contact import system. I'll delve into the technical details in a bit, but at a conceptual level, this new design should allow more flexibility in the import system down the road. The first hint of this in the 2.2 release is the new SQL Query data source option. This allows you to query another database that the CiviCRM database user has appropriate permissions on (on the same MySQL server) to get the source data for the import. For many applications where the data start out in another database anyway, this is much easier and faster than dumping to CSV before importing to CiviCRM.
The CSV data source is, of course, still there. But it's now implemented as a pluggable data source alongside the SQL Query option. If you're a PHP programmer, you can write your own data source plugins too (and submit them to be included in the next version of CiviCRM). Just have a look at CRM/Import/DataSource. You'll see the SQL and CSV plugin classes in there. Your class just has to present a form snippet to get any required info from the user, grab the data based on the form input, and put it into a temporary database table (no massaging of data required, you can just dump everything into varchar or text fields if you don't know what data type the fields contain). The import system takes it from there. Feel free to get in touch with me if you need help writing a data source plugin.
Down the road, I hope to implement (or see others implement) even more nice features on top of this framework. Some ideas I have are:
1. Scheduled / recurring imports. Create an import job, save it, schedule it or set it up to happen every week or month or whatever. This would get us rudimentary data sync w/ external systems.
2. Asynchronous imports. Imports could just run in the background, with a progress meter on your dashboard. But you can work on other things while they're happening.
3. Speed improvements. This new import framework has the potential to speed up importing, though we haven't really started work on that yet, so it could end up being not as significant as we think.
Technical Details
The import system in 2.1 and previous versions of CiviCRM relied on the incoming data being in a CSV file. The file was uploaded to the server, and it was read from for each step of the import process (mapping fields, de-duping, etc.). The new import system instead relies on the incoming data being in a database table for the majority of the processing. This means the SQL Query data source plugin is very simple since it just creates the temporary table based on the user-supplied query. The CSV data source plugin reads the CSV into a database table and the same processing code takes over from there. So data source plugins need only to get their data into a database table and then can let the import code deal with the rest. They don't need to worry about formatting dates, de-duping, mapping fields, or anything else for that matter. This means we should be able to offload some of the heavy lifting of getting the data ready to import to the MySQL server (rather than having PHP do it).
Filed under
Comments
Hi,
My personal hitch: how easy to offer a "more permissive" import as an option. Eg if the country or any fixed list field isn't correct, import it anyway without the field instead of rejecting the line as an all ?
X+
I think this on the list of suggested features for 2.3 (or something similar to it), but it is a bit tangential to this work. I agree, it would be very useful, but it is a change in the back-end code for import, not the front-end data source code.
If it isn't in the 2.3 list, I'd be happy to help you work on it, as I also will need this.
Hello,
These import changes seem to provide tools we need now to develop our new website and will need later as we maintain it.
I've been working on developing our site for over a year and I am (and my clients are) more eager every day to roll this sucker out. It involves importing a fairly large database (both in terms of number of records and number of fields). I've already been forced to upgrade from CiviCRM 1.9 to CiviCRM 2.0 in order to get past the broken price set feature in 1.9 (price sets being essential to our deployment). I also had to re-import all my contact data as, despite the best attempts of the upgrade script, the database schema of 2.0 being so different from 1.9 the data did not transfer over correctly. So, I deleted all my contact info and now I am in the middle of the re-import. I completed importing the contact data and moved on to import the membership data only to run into another bug that was not going to be fixed in 2.0 (http://issues.civicrm.org/jira/browse/CRM-3038). I'm a patient man and luckily, so are my employers, but there are limits.
What I want guidance on is this: I understand CiviCRM 2.2 is an alpha release, but is it's core functionality (other than the new import features) as stable as 2.1? It would be very helpful to our organization to use the SQL query aspect to develop ways for our data maintainer to interact with the data in ways independent of the GUI. The site is not a production site at this point and is being developed on an off-line dedicated server.
Of course, what I want to avoid is another irrevocably broken feature that sets us back to square one.
Alternatively, if we go ahead and develop using CiviCRM 2.1, will upgrading to 2.2 when a stable release arrives be less of a leap than it was going from 1.9 to 2.0?
Any thoughts shared would be appreciated.
Chrys
so normal warnings and disclaimers apply. We do try to minimize bugs etc, but its also part of the software cycle. If you are willing to upgrade on a weekly basis and follow the progression of alpha's and beta's then go for it. We do need folks willing to try alpha releases and put it thru its paces
2.1 -> 2.2 is much less of a leap than either 1.9 -> 2.0 or 2.0 -> 2.1
lobo
Use CiviCRM? Like CiviCRM? Then show your appreciation and donate to CiviCRM