Published
2010-06-08 00:30
I am writing this post to take community feedback on porting CiviCRM to PostgreSQL, the best way to do it, and to team up with any possible members willing to contribute to this effort. Here are my efforts until now. I have not been able to make it 'all' work on PostgreSQL, but certainly to a degree where I can see light at the end of the long tunnel. I am linking 3 PostgreSQL compatible files here:
structure.sql - this has the columns, primary keys, indexes, unique constraints
data.sql - this is the sample data included with CiviCRM
fk_constraints.sql - this has the ALTER statements to introduce foreign key constraints into the structure, after the data has been loaded
http://staging.techmada.com/civicrm_pgsql/structure.zip
http://staging.techmada.com/civicrm_pgsql/data.zip
http://staging.techmada.com/civicrm_pgsql/fk_constraints.zip
Since last month, I have spent some time on trying to port CiviCRM to PostgreSQL. The primary reason being MySQL's performance. In this era of cloud computing and slower multi-core processors, rather than 3.0+ GHz multi-processor machines, MySQL fails to perform well. I went through various benchmarks on the Internet before starting with this, and discovered that PostgreSQL's performance was by far better than MySQL's on multi-core or multi-processor systems. MySQL's InnoDB tables, which are the engine of choice by CiviCRM, cannot perform parallel query execution. InnoDB relies on the clock speed of the CPU to deliver fast results; whereas PostgreSQL can do parallel query execution (GridSQL/PgPool). I mostly take dedicated servers from The Planet. Finding a high clock speed machine with them is becoming difficult each day.
As of CiviCRM's code, changes to get things working seem endless. Here are a few major changes I remember:
1. PostgreSQL likes column names in double quotes when required. Back-ticks are incompatible. Inclusion of hyphens and/or capital letters, or keywords as column names mandate the use of double quotes.
2. Dates and Times in queries must be enclosed in single quotes, and if mentioned together, must be separated by a space. The date format (same) is specified at several places in the code. Each occurence requires manual repair.
3. Automatic Type-Casting is not available in the latest versions of PostgreSQL. I have included some functions to automatically typecast 1 and 0 as True and False respectively. And Integers as Characters. Directly using TRUE, FALSE instead of 1, 0 is compatible with MySQL. But these clauses occur at thousands of places in the code. A global find and replace did not work because of changing syntax, usage and formatting.
4. LIMIT's syntax needed to be changed to include the OFFSET keyword. MySQL supports this variation.
5. Several functions like IF, LEFT, TIMESTAMPDIFF do not exist or work the same way in PostgreSQL.
6. SHOW COLUMNS had to be replaced with a SELECT query to the information schema. Replacement looks compatible with MySQL though.
7. LIKE is case sensitive in PostgreSQL. ILIKE solves the purpose but is incompatible with MySQL.
Again, these are only some of the things that required change. There are many other types of changes I come across whenever I continue testing CiviCRM on PostgreSQL.
Filed under
Comments
Great start dhruvahuja, thanks for laying out your exploratory findings.
What are your goals?
What are your next steps?
Do you have the capacity to help drive and maintain the project or are you looking to hand it off?
My ultimate goal is to completely migrate CiviCRM to PostgreSQL for production use. My organization supports several clients using this fantastic CRM. We have long term relationships with them, and to address their performance issues, be able to better utilize dedicated (but shared among clients) hardware, and switch to cloud computing in a year or two, we are seriously looking at making PostgreSQL work.
In spite of best possible tuned MySQL servers, InnoDB fails to perform adequately because of its architecture. Our clients with 100,000+ records face never ending deduping. And an immensely tuned MySQL server takes huge amounts of memory per connection - thus limiting the number of concurrent users on systems even with more than usual amount of RAM.
My first next step is to fix the import process and perform some benchmarks. Second would be to fix deduping and take the benchmarks again. Imports are typically very slow on already populated databases and the man-effort costs us by the hour. The benchmarking results and comparisons to MySQL will decide the future of this venture and possibly attract some external help. Although, as far as I have been monitoring MySQL server's activities during these processes, I think the results will be amazing.
Unfortunately, I am the only one around me who can take on this task. I came forward at the forum and at this blog to seek out possible members who might be willing to formulate a team. I would love to learn from and share with others in this task. I have the resources and the infrastructure to host, maintain and share my code.
Finally, answering your last question, I do have the capacity, will and need to maintain this project.
I agree with the goals (I'm using PostgreSQL on multiple projects) but for the huge work involved I'm hoping that you're planning to replace the existing MySQL_specific code with an abstraction layer that supports either MySQL or PostgreSQL below it. This is more work, but ensures that new versions support both PostgreSQL and MySQL 'out of the box', rather than forcing a repeated port for every new version.
Of course this requires that the core project accept the small overhead of that abstraction layer even for those people who are quite happy with MySQL, which can be an interesting discussion.