Technical Notes For Developers On Tibetan Translation Tool

THL Toolbox > Reference > Dictionaries > Tibetan Translation Tool > Technical Notes For Developers On Tibetan Translation Tool

Technical Notes For Developers On Tibetan Translation Tool

Contributor(s): Andres Montano

Here we describe the main classes included in org.thdl.tib.scanner. The classes designed to be run from the command-line are:

org.thdl.tib.scanner.BinaryFileGenerator

Note: Included only in DictionarySearchStandalone.jar

Stores multiple dictionaries into a binary tree file, such file format is the only one that can be used directly by the Tibetan Translation Tool.

Syntax

Note: Dictionary files are assumed to be .txt. Don't include extensions!

For one dictionary, to read the definitions stored in dic-name.txt and organize them into dic-name.wrd and dic-name.def:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator [-delimiter] dict-name

For multiple dictionaries, to read the definitions stored in dict-name1.txt, dict-name2.txt, etc.and organize them into dest-file-name.wrd and dest-file-name.def:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator dest-file-name [-delimiter1] dict-name1 [[-delimiter2] dict-name2 …]

-delimiter

  • If this option is omitted, it is assumed that each line is an entry (no multiple-line entries) and the definition and definiendum are separated by '-' (a dash). Even though it is not required, it is highly recommended to include a space before and afterwards (to eliminate any possible ambiguity with regards to the transliteration of reverse vowels in Extended Wylie). A sample entry for the dictionary is:
bkra shis - 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name.
bde legs - 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.

If this were the content of a file called "my-glossary.txt" the binary tree file would be generated with the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator my-glossary

-tab:

It is assumed that each line is an entry (no multiple-line entries) and the definition and definiendum are separated by 't' (horizontal tabulation). One tabulation is enough; don't feel the need to "align" the definitions in your word-processor. A sample entry for the dictionary is: bkra shis 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine. Here, the binary tree file would be generated with the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -tab my-glossary

-string:

It is assumed that each line is an entry (no multiple-line entries) and the definition and definiendum are separated by the character or string of characters specified by the user. A sample entry for the dictionary is:

bkra shis ** 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name.
bde legs ** 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.

Here, the binary tree file would be generated with the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -** my-glossary

-acip:

It is assumed that the electronic file is a transliteration of a Tibetan dictionary. It is called "acip" because it accepts Acip's comment codes ('@' to mark page numbers, brackets to mark comments, etc). Nevertheless, it still requires the files to be in Extended Wylie, so if your file is in Acip's transliteration scheme make sure to run org.thdl.tib.scanner.AcipToWylie first. Definitions here can be of multiple lines, but with no blank lines in between. It is assumed that the definiendum starts after a blank line (except at the beginning of a new page where it could start with the last part of the previous definition) up to the shad (except when the shad is omitted because of grammar rules as for instance no shad after a "ga" suffix without a secondary suffix). Each time a new letter starts, it should be clearly marked in brackets ('', ''), parenthesis ('(', ')') or llaves ('{','}'). A sample entry for the dictionary is:

@1

(ka)

ka ba/ gdung 'degs don byed nus pa/

rkyen/ grogs byed

@2

(kha)

khyod dngos po dang de byung 'brel/ khyod dngos po las byung
zhing/ dngos po ldog stops kyis khyod ldog pa/

khyod dngos po dang bdag gcig 'brel/ khyod ngos po dang bdag
nyid gcig pa'i sgo nas tha dad gang zhig/ dngos po ldog
stops kyis khyod ldog pa/

khyod dngos po dang 'brel pa/ khyod dngos po dang tha dad gang

@3

zhig/ ngos po ldog stobs kyis khyod ldog pa/

kha dog mdog du rung ba'am/ sngo ser dkar dmar sogs mdog tu
rung ba'i gzugs/

Here the binary tree file would be generated with the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -acip my-glossary

Comments: Notice in the sample text that at the beginning of page 2, "zhig" is not a new definiendum, but still is part of the definition of "khyod dngos po dang 'brel pa". Also the definiendum of the last entry is "kha dog" (the shad was omitted after "ga" suffix) and not "kha dog mdog du rung ba'am". Nevertheless the definiendum of the second term is not "khyod dngos po dang bdag" since there is no omitted shad after that "ga" suffix; the definiedum is "khyod dngos po dang bdag gcig 'brel". As is clear from the sample text, the tool has to make a series of "smart guesses" to try to figure out where each definiendum end and it's definition start. Such process is not 100% full-proof, so expect some mistakes.

Dictionaries in different formats can be processed together. For instance the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator alldicts ry-dic99 -acip myglossary_uma -tab myglossary_rdzogs-chen

would generate alldicts.def and alldicts.wrd processing ry-dic99.txt as dash-separated, myglossary_rdzogs-chen.txt as tab-separated and myglossary_uma.txt in the transliteration format explained above.

org.thdl.tib.scanner.AcipToWylie

Note: Included only in DictionarySearchStandalone.jar

Provides an interface to convert from tibetan text transliterated in the Acip scheme to THDL's Extended Wylie scheme.

If no arguments are sent, it takes the Acip text from the standard input and sends the Wylie text to the standard output. If one argument is sent, it interprets it as the file name for the input. If two arguments are sent, it interprets the first one as the file name for the input and the second one as the file name for the output. For example, the following command converts the lam-rim-chen-mo.act storing the results in lam-rim-chen-mo.txt:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act lam-rim-chen-mo.txt

Alternatively by redirecting the standard input/output you perform the same job:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie < lam-rim-chen-mo.act > lam-rim-chen-mo.txt

If you only want to display the results to the screen, you can run:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act | more

org.thdl.tib.scanner.SwingWindowScannerFilter

Note: Included in both DictionarySearchStandalone.jar and DictionarySearchHandheld.jar

This is the tool's main class. It loads a dictionary stored in the binary tree file format (use org.thdl.tib.scanner.BinaryFileGenerator to create it) and provides a graphical interface to input Tibetan text (in Roman or Tibetan script) and displays the words (in Roman or Tibetan script) with its definitions. Works without Tibetan script in platforms that don't support Swing. Can access dictionaries stored locally or remotely. For example, to access the public dictionary database run the command:

If the JRE you installed does not support Swing classes but supports AWT (as the JRE for handhelds) use org.thdl.tib.scanner.PocketWindowScannerFilter found in DictionarySearchHandheld.jar. Its syntax is the same.

org.thdl.tib.scanner.ConsoleScannerFilter

Note: Included in both DictionarySearchStandalone.jar and DictionarySearchHandheld.jar

Inputs a Tibetan text and displays the words with their definitions through the console over a shell. Use when no graphical interface is supported or for batch processes. For instance:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99

It reads from the standard input and prints the results to the standard output. For example if you want to parse a text stored in puja.txt and save the results in puja_words.txt, you can run the command:

java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99 < puja.txt > puja_words.txt

Provided for unrestricted use by the external link: Tibetan and Himalayan Library