A view-based approach to multi-language CiviCRM

Published
2008-06-30 08:25
Written by
shot - member of the CiviCRM community - view blog guidelines

Long time no blog – mostly because my initial concept of bringing the multi-language features to CiviCRM was replaced with a brand new approach, which should be much more developer-friendly.

Having the contents of a CiviCRM site in multiple languages means that certain columns in the database (the user-visible ones) must be localisable – but how to implement this from the database point of view is far from obvious.

My initial approach was to create a single civicrm_l10n table, with columns of entity_table, entity_column, entity_id, locale and translation. This approach has the advantage of being space-efficient; if only a handful of the database’s contents is localised in a given language, this table would hold just a couple of rows. This is also the least disruptive approach from the database’s point of view: only one new table is introduced.

Unfortunately, this approach has the drawback to be much more code-disruptive – any query that retrieves database values for display must be changed to check whether there isn’t a localised version in civicrm_l10n; any save operation would have to save to civicrm_l10n if the language is not the default one. This kind of disruption affects all the other CiviCRM developers, including the core team; evern since the introduction of this code onto the main repository, the developers would have to cater for the multilingual stuff when maintaining the codebase, and should consider internationalisation issues when writing new code.

The will to ease the future development of CiviCRM led my train of thought onto new tracks. What if instead of having civicrm_l10n table entries like ('civicrm_option_value', 'label', 69, 'pl_PL', 'Tłumaczenie') – which would mean that if you’re using Polish and are trying to display the contents of the row 69 and column ‘label’ from the ‘civicrm_option_value’ table, then you should display ‘Tłumaczenie’ instead of the original – we could stick to the current queries of simply displaying column ‘label’ for row 69 of X (where X is currently civicrm_option_value)?

Then it hit me – what if I used MySQL views for this? It turns out this seems like a sane idea. Instead of a separate civicrm_l10n table, every column that needs to be localisable in tablename is multiplied as columnname_locale, and a new view, tablename_locale is created that makes this column appear as columnname inside of it. For the above example, instead of having civicrm_l10n table entries with entity_table of ‘civicrm_option_value’ and entity_column of ‘label’, the civicrm_option_value table would simply gain label_pl_PL column and a civicrm_option_value_pl_PL view would be created that would work just like the original civicrm_option_value table, but with the label_pl_PL column visible as ‘label’.

This way, any code that currently operates on the civicrm_option_value table (and reads or  writes to column label) would still work if it was only changed to operate on the civicrm_option_value_pl_PL view instead.

I believe this is a viable approach; when using our DAO classes, the code should refer to the _tableName property (which can be build dynamically depeding on the currently-used locale), and when creating SQL by hand, it should simply refer to civicrm_table_$locale instead of civicrm_table (where $locale holds the current locale).

The coming days should see the implementation of this on the gsoc-i18n branch. Stay tuned for further blog posts on how it turned out.

Comments

1. would be good to get an idea of how many tables and columns this affects. I suspect not a whole lot

2. if an install is multi-lingual and supports X languages, it will need X-1 additional columns for each of the columns in 1. Any idea how you plan to "inject" this into the code base (its a change to the sql and DAO's)

3. assuming there are not too many columns in pt 1 above, would it make sense to define a virtual function called: "getColumnName" which determines the column name based on table, column and locale. This insulates the code from the underlying implementation. Also since most of our queries always go through a few core functions (ideally just a few), the change will be quite localized (and transparent to most developers). I think this might be more efficient and saner than creating and destroying views for every page load (as needed)

4. i do think that this plan is better than the one global civicrm_l10n table :)