Sharing experiences with ETL tools

Published
2011-06-06 09:30
Written by

Are you interested in open source collaboration on how to use ETL (Extract, Transform and Load) software for data migrations into CiviCRM?

There was a good discussion on the LinkedIn CiviCRM group (http://www.linkedin.com/groups?mostPopular=&gid=1418647) last week about importing voter files into CiviCRM. It seems a number of organizations are using specialized tools for ETL (Extract, Transform, and Load), including Pentaho Kettle, Talend, and SQLManager.net. I see advantages in terms of scalability, ease of use, builtin cleanup and transformation functionality,  decreased cost and improved performance when doing large and complex data migrations in comparison with the import tools currently available in CiviCRM. The latter continue to be best in my view for smaller and simpler migrations.

We're thinking that it might be useful to 'open-source' how we are using these tools in terms of both code and documentation. Initially these might be more like the shared code snippets and recipes in the developer documentation. In some cases there will be opportunities for significant reuse, e.g., migrations from popular platforms like Raiser's Edge, Convio, Salesforce.com and probably some specialized tools like Constant Contact. Perhaps down the road there might be some areas where joint development would make sense, for example, CiviCRM extensions to support importing from a particular platform.

If people could comment here if they are interested in contributing to such an effort or making use of it that would help us identify the possible demand. It would also be useful to know who is using these tools for what sorts of migrations, and where they see opportunities for collaboration. Finally, it would good to get feedback on where we should try to facilitate this sharing...would the wiki and github be best, or somewhere else?

Filed under

Comments

Dina (not verified)
2011-06-10 - 03:09
Roly (not verified)
2011-06-16 - 18:28

Has anyone since 2011 proposed building a CiviCRM ETL as an extension (as opposed to using Kettle/Talend outside of CiviCRM))? Not only is there a need to transform data but also for less brittle Reports (like a tool that doesn't require hacking about with templates)? Jasper makes nice reports but how could it be integrated /used in an extension, requires a JVM, right?