Lost in translation

Published
2015-09-30 09:32
Written by

If you (and your colleagues/users) are an English native speaker, DON'T READ THIS. You don't have to bother. Lucky you.

The CiviCRM localisation has come a long way, and by now it's pretty comprehensive and surprisingly versatile. Sometimes the quality of the translated strings is a little questionable, ranging from "strange" to "funny", and sometimes far into the absurd. But that's mostly due to inexperienced translators, and nothing that a little bit of quality control by the Transifex coordinators can't fix.

There is, however, one problem that's been particularly elusive. What to do if the same English word has two or more different translations in your language, depending on the context? I'll give you an example in my native language, German:

  1. "Write an email to Mark" would translate as "Schreibe eine E-Mail an Mark".
  2. "CodeSprint will be from the 10th to the 16th" would translate as "CodeSprint wird vom 10. bis zum 16. stattfinden"

You see the difference? Granted, German is an unnecessarily complicated language, but I'm sure there's similar examples in your language as well. Here's another one:

  1. "next page" would translate as "nächste Seite"
  2. "next entry" would translate as "nächster Eintrag"

The problem arises when the string (like "to" or "next") is isolated in the code, e.g. for the different lables in a form or a button. However you decide to translate the string, it's going to be wrong in the other contexts. But how can we solve this?

The best approach, I think, is to add the context that string is used in - each time it appears during the page generation process. And luckily the underlying translation system, GNU GETTEXT, offers you a way to do exactly this - tag the string with a context.

This system is already being used in CiviCRM, e.g. for countries, states, or in order to separate menu entries from page content: The translation for "Home" in the menu is (in many languages) very different from the address type "Home". We just need to roll this system out to the not-that-simple cases, where the context is more than just "menu" or not.

I have started a collection of these cases HERE, and I welcome you to add all those cases that you just couldn't get the right translation for, because it would always be wrong in the other places.

The actual implementation of the changes will be a little more complex than it sounds. We have to make sure that the previous translations still work if no context-tagged translation is provided. That it works on every installation. That all the localisation tools pick it up. That it doesn't break anything. You get the picture. I will tackle this subject during the code sprint taking place right after CiviCon London. So please submit your cases until October 11th, and I will include them in the initial roll-out.

 

Comments

Awesome, thanks for taking the initiative with this!

Have a look at how translation is done in Symfony. They have even have thought of how to translate plurals. Because in some other languages you have more plural forms. E.g. in Russian you have a different plural forms depending on the number. E.g. in English you have 1 window, 2 windows, 5 windows. In Russian you the word for window is Odno Okno, Two windows is dva Okna and five windows is pyat' Okon

+1, that's another important aspect that is often overlooked. CiviCRM supports variable plural forms using Gettext (for Russian/Polish/etc), although there may be some places where the code needs improvement because a programmer could have written "if one item, print this, else print that". Gettext simplifies that with the "count" feature, so that there is no need for the if/else.

https://wiki.civicrm.org/confluence/display/CRMDOC/Internationalisation+for+Developers (see "plural issues")

I often look to Symfony for inspiration (we will have to tackle book translation at some point), but for this in particular, I was surprised to see that they do not use Gettext, but instead re-implement a custom translation system. They also do not seem to use contexts. (http://symfony.com/doc/current/book/translation.html) Drupal has some docs on contexts, and the associated issue linked in the doc has some interesting comments (https://www.drupal.org/node/1369936), but it's no silver bullet.

For what it's worth, there are 3 contexts at the moment in CiviCRM: menu, country and state/provinces. For example, if a state/province had a name that was the same as a common noun already in CiviCRM, they may not translate to the same thing. It's also pretty useful to translators to know that "this string is a province name", "this string is a menu item". As Björn points out, we need to go further in some situations (mostly with 1 or 2 word strings) and be more clear that "this is a date-related string", "this is a location-related string", so that we can translate words such as "to" or "home" correctly.

There are places where we might want to re-think strings such as "New %1", "Edit %1". Such strings require translators to be very creative, and translate as, for example "New entity of type: %1" (so that it is gender-neutral).

There was also an interesting thread on the forum on this topic: https://forum.civicrm.org/index.php?topic=36344.0

Thanks Björn for tackling this issue!

Good point, Jaap! Mathieu had already pointed me towards the way gettext solves the problem with plurals, but to my understanding there is no similar mechanism for declension in other cases - like the gender-based problem outlined in example #2... Am I missing something here?

Awesome initiative

would gettext context solve as well one of my pet issues in french:

- home (as in homepage) -> Acceuil

- home (as the location type) -> Domicile