Contributor(s): Andres Montano
Here we describe the main classes included in org.thdl.tib.scanner. The classes designed to be run from the command-line are:
Note: Included only in DictionarySearchStandalone.jar
Stores multiple dictionaries into a binary tree file, such file format is the only one that can be used directly by the Tibetan Translation Tool.
Note: Dictionary files are assumed to be .txt. Don't include extensions!
For one dictionary, to read the definitions stored in dic-name.txt and organize them into dic-name.wrd and dic-name.def:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator [-delimiter] dict-name
For multiple dictionaries, to read the definitions stored in dict-name1.txt, dict-name2.txt, etc.and organize them into dest-file-name.wrd and dest-file-name.def:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator dest-file-name [-delimiter1] dict-name1 [[-delimiter2] dict-name2 …]
bkra shis - 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs - 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.
If this were the content of a file called "my-glossary.txt" the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator my-glossary
It is assumed that each line is an entry (no multiple-line entries) and the definition and definiendum are separated by 't' (horizontal tabulation). One tabulation is enough; don't feel the need to "align" the definitions in your word-processor. A sample entry for the dictionary is: bkra shis 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine. Here, the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -tab my-glossary
It is assumed that each line is an entry (no multiple-line entries) and the definition and definiendum are separated by the character or string of characters specified by the user. A sample entry for the dictionary is:
bkra shis ** 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs ** 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.
Here, the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -** my-glossary
It is assumed that the electronic file is a transliteration of a Tibetan dictionary. It is called "acip" because it accepts Acip's comment codes ('@' to mark page numbers, brackets to mark comments, etc). Nevertheless, it still requires the files to be in Extended Wylie, so if your file is in Acip's transliteration scheme make sure to run org.thdl.tib.scanner.AcipToWylie first. Definitions here can be of multiple lines, but with no blank lines in between. It is assumed that the definiendum starts after a blank line (except at the beginning of a new page where it could start with the last part of the previous definition) up to the shad (except when the shad is omitted because of grammar rules as for instance no shad after a "ga" suffix without a secondary suffix). Each time a new letter starts, it should be clearly marked in brackets ('', ''), parenthesis ('(', ')') or llaves ('{','}'). A sample entry for the dictionary is:
@1 (ka) ka ba/ gdung 'degs don byed nus pa/ rkyen/ grogs byed @2 (kha) khyod dngos po dang de byung 'brel/ khyod dngos po las byung zhing/ dngos po ldog stops kyis khyod ldog pa/ khyod dngos po dang bdag gcig 'brel/ khyod ngos po dang bdag nyid gcig pa'i sgo nas tha dad gang zhig/ dngos po ldog stops kyis khyod ldog pa/ khyod dngos po dang 'brel pa/ khyod dngos po dang tha dad gang @3 zhig/ ngos po ldog stobs kyis khyod ldog pa/ kha dog mdog du rung ba'am/ sngo ser dkar dmar sogs mdog tu rung ba'i gzugs/
Here the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -acip my-glossary
Comments: Notice in the sample text that at the beginning of page 2, "zhig" is not a new definiendum, but still is part of the definition of "khyod dngos po dang 'brel pa". Also the definiendum of the last entry is "kha dog" (the shad was omitted after "ga" suffix) and not "kha dog mdog du rung ba'am". Nevertheless the definiendum of the second term is not "khyod dngos po dang bdag" since there is no omitted shad after that "ga" suffix; the definiedum is "khyod dngos po dang bdag gcig 'brel". As is clear from the sample text, the tool has to make a series of "smart guesses" to try to figure out where each definiendum end and it's definition start. Such process is not 100% full-proof, so expect some mistakes.
Dictionaries in different formats can be processed together. For instance the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator alldicts ry-dic99 -acip myglossary_uma -tab myglossary_rdzogs-chen
would generate alldicts.def and alldicts.wrd processing ry-dic99.txt as dash-separated, myglossary_rdzogs-chen.txt as tab-separated and myglossary_uma.txt in the transliteration format explained above.
Note: Included only in DictionarySearchStandalone.jar
Provides an interface to convert from tibetan text transliterated in the Acip scheme to THDL's Extended Wylie scheme.
If no arguments are sent, it takes the Acip text from the standard input and sends the Wylie text to the standard output. If one argument is sent, it interprets it as the file name for the input. If two arguments are sent, it interprets the first one as the file name for the input and the second one as the file name for the output. For example, the following command converts the lam-rim-chen-mo.act storing the results in lam-rim-chen-mo.txt:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act lam-rim-chen-mo.txt
Alternatively by redirecting the standard input/output you perform the same job:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie < lam-rim-chen-mo.act > lam-rim-chen-mo.txt
If you only want to display the results to the screen, you can run:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act | more
Note: Included in both DictionarySearchStandalone.jar and DictionarySearchHandheld.jar
This is the tool's main class. It loads a dictionary stored in the binary tree file format (use org.thdl.tib.scanner.BinaryFileGenerator to create it) and provides a graphical interface to input Tibetan text (in Roman or Tibetan script) and displays the words (in Roman or Tibetan script) with its definitions. Works without Tibetan script in platforms that don't support Swing. Can access dictionaries stored locally or remotely. For example, to access the public dictionary database run the command:
java -jar DictionarySearchStandalone.jar http://www.thdl.org/tibetan/servlet/org.thdl.tib.scanner.RemoteScannerFilter
If the JRE you installed does not support Swing classes but supports AWT (as the JRE for handhelds) use org.thdl.tib.scanner.PocketWindowScannerFilter found in DictionarySearchHandheld.jar. Its syntax is the same.
Note: Included in both DictionarySearchStandalone.jar and DictionarySearchHandheld.jar
Inputs a Tibetan text and displays the words with their definitions through the console over a shell. Use when no graphical interface is supported or for batch processes. For instance:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99
It reads from the standard input and prints the results to the standard output. For example if you want to parse a text stored in puja.txt and save the results in puja_words.txt, you can run the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99 < puja.txt > puja_words.txt