Tamil language notes

Post

Posted
Rating:
#1042 (In Topic #226)
Avatar
Standard member
Rajesh Kumar is in the usergroup ‘Community saint’
Tamil has been fully translated for all non-administrative language strings, via Transifex. The language pack is now available as an addon.
We've also done extensive testing.

This topic contains notes on the translation.

Progress

We have not translated the administrative strings (e.g. the installer, managing content, administering site structure): but we hope users will come in and contribute translations here also.

Unicode

Tamil has quite a complex elegant script and the standardised Unicode implementation of Tamil is rather clunky.

With Unicode the Tamil script works via combining characters together and then joining / reordering them. Unicode supports stuff like that, via something called "Complex Text Layout". Unicode probably does it because it is inheriting from the original character set implementation of Tamil which had to do the same due to 8-bit limitations.

In fact there's this whole alternative standard called TACE16 that tries to improve Unicode, but we're not using it (and can't). Many Tamil native speakers, including the Tamil Nadu government think TACE16 is a better approach. This is because the wider set of characters within TACE16 are those that are understood conceputally, while the smaller set in Unicode is purely to do with how you write it down. That's a fundamental linguistic disagreement.

The Unicode implementation of Tamil creates a couple of practical problems for us:

Logo wizard

Using the Logo Wizard to generate a title in Tamil won't always work correctly

Sample screenshot:
LogoWizard.png

Unfortunately PHP does not support Complex Text Layout, and FreeType (the font implementation PHP uses) doesn't handle it directly itself. It would be a very big job to fix that and likely wouldn't work on standard web hosting anyway.

As a workaround the Logo Wizard is tweaked so that you can leave the Site Name field blank. You can therefore generate the right size image, with the nice imagery, and then manually write the text onto it.

Truncations

Sometimes we need to truncate text to create an automatic summary of it. However, as the Tamil characters within Unicode are not really "atomic", cuts can happen in unnatural positions, even on combinational characters. That's just an imperfection that currently exists and would take a lot of effort to resolve (a lot of custom programming by someone who knows Tamil well). PHP libraries like mbstring do not handle the situation.

Fonts

Most fonts do not contain Tamil characters. Operating systems fall back to alternative fonts when a font doesn't contain a particular character. We find Latha is a nice font, so we favour this one.

The Tamil language pack automatically substitutes in the Latha font to the CSS, via a post-processing filter.

Text size

Tamil script tends to use both more characters, and also larger (more elegant) characters than Latin script like English. Therefore we tweak the layout a little bit so that Tamil fits. This is done using the 'takes_lots_of_space' flag in the language pack.

Locales

We couldn't get a Tamil locale working on our Mac and Windows test machines (for day/month names to come out within dates correctly, basically). We therefore configured the 'locale_subst' feature in the language pack to automatically substitute in the correct day/month names after the locale runs. We didn't test on Linux where it may work properly (the 'locale_subst' feature will not interfere with this).
 

Last edit: by Chris Graham

Online now: No Back to the top
1 guest and 0 members have just viewed this.

Statistics

Users online:

Philip, gabriel58, Salman, amit.nigam, Manu, John Connor

Forum statistics:
  • 1,109 topics, 5,332 posts, 6,258 members
  • Our newest member is nicholasz
Birthdays:
Back to Top