Tibetan Translation Tool

THL Toolbox > Reference > Dictionaries > Tibetan Translation Tool

Tibetan Translation Tool Technical Documentation

Contributor(s): Andres Montano, Zach Rowinski.

Quick Links to Further Information on this Tool

Introduction

The Tibetan to English Dictionary and Translation Tool takes Tibetan language passages – which can be cut and pasted in, typed in external link: Wylie transliteration, or typed in Tibetan script – and breaks down the passages into their component phrases and words, and displays corresponding dictionary definitions. The Tibetan Translation Tool can be used external link: online on THL without any installation. However, with some simple installation, you can also run it online with more advanced functionality, such as typing in Tibetan script instead of roman script transliteration (external link: Wylie). In addition, it is also possible to install the software on your own computer and then run the Tibetan Translation Tool offline without an Internet connection. Please see the Quick Links above for further details on installation, the dictionary content, user instructions, and technical notes, respectively.

The tool was developed and implemented in its current state by Andrés Montano Pellegrini at the University of Virginia, while the Tibetan script input facility was built by Edward Garrett while working at the University of Virginia.

This tool partially automates the process of translation by breaking up a sentence/paragraph entered in external link: Extended Wylie or Tibetan script into the biggest component parts it can find in multiple dictionary databases. Then for each component part found, it displays its stored definitions and relevant information. This will thus often yield only the definition of a long phrase, rather than its component words, but one can also search for the syllables of that phrase one by one separately. In the Tibetan language, the boundaries of individual words are not marked in any manner such as the way in which spaces separate and mark words in English. Instead, there is a punctuation mark called a "tsheg" which separates each syllable. Thus while syllabic boundaries are utterly explicit, word boundaries are often unclear. One of the main difficulties beginning students thus have with translating Tibetan texts is figuring out where each word ends and the next word starts, and determining what series of syllables to look up in the dictionary either as constituting a single word or a larger compound phrase. This entails a very time consuming process of looking up multiple combinations of syllables to determine which are found within a given dictionary.

Future Plans

There are plans to build a parser on top of the scanner. Currently the utility is working as a scanner. It takes its input and identifies its individual elements that constitute it. Eventually the scanner will be only the first phase of a bigger Tibetan to English Translation Utility. The second phase is to build a parser on top of the scanner. The parser would determine how such individual elements are related to each other according to pre-determined grammatical rules. Ideally, it would figure out the Tibetan grammar and produce an English sentence.

Suggestions are Welcome!

Please submit to THL your suggestions (as comments to this entry) regarding:

  • Bugs that need to be fixed.
  • Other Tibetan to English dictionaries that could be made available online through this utility.
  • Installation tips on platforms not documented here.

Make sure that the issues you are suggesting or inquiring are not already addressed in the documentation.

Known Bugs

Since multiple dictionaries compiled by different people are being accessed simultaneously, slight differences in the implementation of the Wylie system of transliteration between the THDL Tibetan entry system and each dictionary could cause some words with uncommon stacks to not be found.

For Personal Java Runtime Environment for Windows CE Version 1.1 Beta 1 (the only JRE made available by Sun for the StrongARM processor), classes must be compiled with any JDK version 1.2.2 up to 1.3.1_02. Classes compiled with JDK1.4.0 are incompatible with it.

The internal applet version cannot be accessed outside the UVa network through the ITC Proxy Server. The applet-server communication through a proxy server is not supported.

Troubleshooting

I get garbage when I type Tibetan.

Make sure the TibetanMachineWeb fonts are installed. external link: Free download available.

The vowels don't look right

Due to Java Bug Id 4498203, Tibetan vowels do not display properly in some versions of Java. Problematic versions include 1.3.0-02, 1.3.1, and the 1.4 beta 2. Uninstall Java, then download a newer version (1.4 or higher) and install it. The vowels should look better.

I can't convert from Wylie to Tibetan.

Make sure the Wylie you are using is valid Extended Wylie.

I can't paste Tibetan Script copied in Microsoft XP.

A text using Tibetan script in the Tibetan Machine Web font or Tibetan Machine font can be copied from most applications supporting Window's clipboard system and pasted directly into the Tibetan Translation Tool.

Unfortunately when Tibetan script is used, Microsoft Word XP's (and higher) clipboard system is currently too sophisticated for the Tibetan Translation Tool to recognize it! A work-around is to have the intermediate step of copying the Tibetan text from Word and pasting into a less sophisticated text editor such as Microsoft WordPad. Then the text can be copied from WordPad and pasted into the Tibetan Translation Tool. If the text is in Roman Script (using Extended Wylie or ACIP's transliteration system) there is no problem.

I have bought The Rangjung Yeshe Tibetan-English Dictionary of Buddhist Culture and have downloaded Jeffrey Hopkins' Tibetan-Sanskrit-English Dictionary. How can I create a database to access offline both dictionaries simultaneously?

Importing a text dictionary into an existing dictionary database is currently not possible. Nevertheless the new Rangjung Yeshe Tibetan-English Dharma Dictionary 3.0 CD includes the Tibetan Translation Tool with a database for it to access both dictionaries simultaneously.

Can I install the Translation Tool on my Palm OS handheld or on my cell phone?

Theoretically the Tibetan Translation Tool for handeld devices runs on any platform that supports JVM. Nevertheless, the Sun's JVM for Palm OS handhelds (implemented as Connected Limited Device Configuration (CLDC)/ Mobile Information Device Profile (MIDP)) and mobile information devices like mobile phones (Java 2, Micro Edition) have too limited of a functionality for it to work. An alternative would be to seek JVMs from third parties that are "compatible enough" to allow the Tibetan Translation Tool for handeld devices to work. You are welcome to try it out and let us know!

Including the Translation Tool in Your Webpage

The code to include the Tibetan Translation Tool in any page is:

<script type="text/javascript" src="http://www.thlib.org/global/js/thl-ttt-include.js"></script>

This should enable the Translation Tool for the page. Highlight a section of unicode Tibetan text and press Ctrl /.

AMP Open Community License

The AMP Open Community License is a free software license. The source code for our software is completely open and public. It is available at external link: external link: http://sourceforge.net/projects/thdltools/. We hope that others will contribute and build upon what we've done, resulting in better, more useful products. The core of our license is the Open Public License (OPL), which is also used by THL and Enhydra. The OPL is a slightly modified version of the popular Mozilla Public License.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library