Walking in a wi18nter wonderland

Published
2019-01-03 23:16
Written by
Happy New Year, CiviCRM community! I bring exciting news to all of those,  who have to deal with localisation (l10n) and internationalisation (i18n) - in short:  everyone who is occasionally unhappy with the texts in the user interface or public forms. And yes, that might include English forms as well.
 
The origin of this post goes back to some inspiring discussions I had with Tim Otten and Aidan Saunders at the Bamford sprint in autumn 2018. We were aiming for nothing short of the revolution of CiviCRM's translation system. Soon you will be able to adjust the translations on-the-fly, or inject a whole cascade of custom translation (.mo) files into the system to make the UI say exactly what you want it to say.
 
Over the holidays, I finally had some time to follow up on those ideas, and implemented a working prototype of stage one of this revolution. In short, I created an extension (org.civicrm.l10nx) that provides a new interface into the internals of the old ts() system. This opens up a huge amount of opportunities, some of which I will showcase at the end of this post.
 
Now, if you're thinking something like "what do we need this for?", please take a minute and talk to someone who works on a non-English CiviCRM.
 
Let me outline the three stages of development for you:

Stage 1: Patched to work right now

In the first stage, we use CiviCRM's existing feature to provide a custom ts() function as an entry point to feed into the new hooks/events.  Unfortunately, the system currently doesn't work with functions provided by extension, so at the moment you would also have to apply the patch shipped with the extension.
 
This stage is already working.

Stage 2: Refactored Core I18n

Tim Otten already started working on a complete refactoring of the I18n core component, which is responsible, among other things, for translation. This refactoring will not only make this subsystem faster and more flexible, but also allows a seamless integration of this extension. No more need for patching core.
 
As soon as this refactoring is part of the core, I will adjust this extension to use the new integration.
 

Stage 3: User Data Translation

This last stage is going to be a game changer. We will extend the l10n system to user data, i.e. labels, names, option values, etc. Everything that you entered into CiviCRM will be exposed to this new translation approach.
 
So far, a multi-language user data was only possible with the "Multiple Languages Support", which extends the DB scheme for each additional language. The current implementation concept is, no doubt, genius. However, it doesn't scale well, and the translation of the individual strings is tedious.
 
Granted, you probably don't need your backend to cater for 50 different languages, but why shouldn't you want your public forms to offer that? This third stage can get you there, and with minimal effort.
 
To achieve the translation of user data, we will introduce the dts() function, that is an exact replica of the ts() function, but used for user data. This new function will then, of course, also use our exciting new infrastructure.  
 

How will it work?

Installing the org.civicrm.l10nx extension adds extra hooks to CiviCRM's translation (l10n/i18n) system, so other extensions can influence how, when, and where translation takes place - for each individual translation or a whole set.
 
Namely there are three new "hooks" (implemented exclusively as Symfony events for performance reasons):
  • custom_mo(&$mo_file_paths, $locale, $context, $domain) allows you to inject custom .mo files into the translation, based on the translation context.
  • ts_post($locale, $original_text, &$translated_text, $params) allows you to detect, profile and change any prepared translation just before delivery
  • ds_post($locale, $original_text, &$translated_text, $params) allows you to do the same for user data. This will only become available with stage 3.
 
If you want to know how these new hooks could be implemented, have a look at the following extensions:
  • SYSTOPIA's Custom MO extension allows you to define custom .MO files that should be evaluated before the regular translation kicks in. 
  • SYSTOPIA's Profiler extension allows you to live-capture ongoing translations and export those as .PO and .POT files, so you can easily create or amend the existing translation.
Both of those extensions are still prototypes, but showcase the great potential.

What can I do with it?

Let's say I'm unhappy with the the texts on a public facing form. Just with the two extensions outlined above I can now do the following:
  1. Start the translation profiler (de.systopia.l10nprofiler)
  2. Fill out the form, go through every step
  3. Stop the profiler
  4. Download the PO file.
  5. Use a translation editor (like POEdit) to change the translations where you want to
  6. Upload the resulting MO file the editor produces into the second extension (de.systopia.l10nmo) and use to its configuration page to activate it
  7. Done!
 
You can can also use this to produce translations for any number of languages by creating a template (POT) file, having them translated by various translators, and upload the results as a bundle. This option will get even more exciting with stage 3, when user data (e.g. custom fields and options) will be translated as well.
 
Another application of the new hooks could be to create an extension with a programmed word replacement, where you could make sure it does the right thing in all places when, for example, replacing 'Member' with 'Supporter'.
 
Yet another application could be, to create an extension to finally get rid of the eternal problem with strings like 'to', that need to be translated to different strings in other languages, depending on the context. It was, until now, not possible to differentiate between those in translation, which always made the user experience slightly bumpy.
 
As you can see, this project opens up a lot of opportunities to improve CiviCRM's user experience, especially in an international context. I'm still excited about the possibilities.
 
Stay tuned, play with the prototypes, and feel free to give feedback!

Comments

It's pretty exciting to see you moving forward with this!

For folks who weren't in the conversation at Bamford and who don't do a whole lot with internationalization, I just want to add a little bit more of the context. CiviCRM's translation generally works fine if the site starts with one language and sticks with it, and it also supports multiple languages. Each additional language comes with a cost (requiring a set of additional MySQL columns on many tables) -- and MySQL imposes some hard limits on the number columns. The extra technical cost is fine for 2 or 3 languages but becomes problematic with 10 languages.

The basic idea here is to shift some of the multilingual work out of MySQL and into gettext. MySQL is a general-purpose data-store... and gettext is a specialized data-store optimized for translation of strings across many languages. The great thing is that gettext performs well at runtime; it's a de-facto standard for software translation projects; and it's already used for large parts of Civi's translation. What's the catch? The workflows are usually geared toward the needs of experts (translators/developers), but Civi's ecosystem is diverse (including experts and novices), and the current MySQL approach has better options for reaching novices. Making something comparable with a pure-gettext runtime definitely calls for some R&D (design/tooling/development/documentation/etc).

I really like how l10nx is oriented around extensions. That means Bjoern (and other experts, hopefully!) can collaborate on this publicly and deploy the iterations as needed -- even while less sophisticated users can continue with the built-in translation mechanisms (and hopefully lend some moral support!).

Great stuff. I continue to be amazed how many projects you are involved with enabling & collaborating on Tim. 

Bjoern - great to see you picking this up & running with it.