Rendering Tibetan Properly In Mixed Text Environments

THL Toolbox > Developers' Zone > Ruby On Rails Development > Rendering Tibetan Properly In Mixed Text Environments

Rendering Tibetan Properly In Mixed Text Environments

Contributor(s): David Germano, Andres Montano.

When Tibetan script words are mixed in with Roman script words, one faces a challenge. Firstly the Roman script should usually be 12 points, but Tibetan is unclear at 12 point and needs to be displayed at a larger font size. Secondly, the font used for showing Tibetan is not usually the same font you want to use to display Roman Script, even if it has those glyphs built into it.

Andres Montano has thus written a library that addresses this issue. It converts the Tibetan into hard coded character entities intstead of UTF*, and the browser then knows how to display those.

Initially it was written as part of the Globalize plugin which he called "globalize_complex_scripts", and then later it was detached from this to be free standing. In this way, Ruby on Rails applications can take advantage of the globalize_complex_scripts without the additional complexity involved in setting up Globalize itself, nor the overhead of running globalize if there is no additional need for it. . Globalize_complex_scripts is a plugin that extends the popular globalize plugin (external link: http://www.globalize-rails.org/). Globalize provides simple ways to provide an app with multilingual views and models. Globalize complex_scripts extends multilingual support to for easier handling of complex scripts.

The main functions are extensions of the String class:

1. span (or "s" for short): takes a string and spans characters in predefined unicode ranges with xml:lang and class attribute for easy rendering. Also converts characters outside ascii range into NCR.

2. translate_and_span (or "ts" for short): used instead of globalize's translate method (used to translate strings in views) so that the translation includes the added language metadata 3. translate_and_encode (or "te" for short): converts chars outside ASCII range to NCR, but does not span. To be used in forms (buttons, drop-down lists, etc.).

Besides this it provides some helper methods to facilitate translation of views, which are outside the scope of the current interest in the integration of globalize_complex_scripts into dict app.

The spanning relies on a languages model that has the unicode ranges and ISO codes of the languages to be "spanned".

The code is at: external link: http://ndlb.svn.sourceforge.net/svnroot/ndlb/portal/ror/plugins/globalize_complex_scripts/trunk/

lib has the main libraries

  • globalize_complex_scripts.rb : the methods I described in e-mail
  • helpers/ : helpers for easier use of globalize
  • models/ : language model described in e-mail
  • patches/ : fixes to make globalize compatible with rails 2.1.0

An example would be: "See ཀ་བ་ pillar." becomes "See <span lang="bo" xml:lang="bo" class="bo">&#3904;&#3851;&#3926;&#3851;</span> pillar." Thus interspersed Tibetan is turned into NCR and a span tag with lang and class attributes set to "bo" is added. Then the stylesheet defines the appropriate font and size to class "bo"

You can see in action in: external link: http://staging.mms.thdl.org/dictionary_searches/98?language_id=7.

There are also helper methods to have the form text boxes bigger for Tibetan Unicode to fit, but still it would rely on a Tibetan Unicode keyboard installed in system like the wylie keyboard in leopard (mac) or tise in windows.

So in himalayan dicts, the code reads

  • <%= f.text_field :title, fixed_language_options %>
  • and generates html: <input class="dz" id="dictionary_search_title" lang="dz" name="dictionary_searchtitle" size="30" type="text" value="" xml:lang="dz" />
  • the key there being class "dz" which the stylesheet handles

Provided for unrestricted use by the external link: Tibetan and Himalayan Library