Contributor(s): Than Grove
Input Tibetan texts can be converted from Word into XML using the latest WordToXML converter. The conversion is done in batch format that will convert several documents (usually a volume's worth) at once. The conversion process is relatively simple. However, the process of converting a whole volume can take a good amount of time. The process assumes that one has a volume of input and partially marked up texts. The mark up can be minimal. The requirements for the input texts are:
Other more specific mark-up is not required, but is helpful and therefore recommended. This is covered elsewhere in this documentation.
The process for converting Tibetan texts is as follows:
If a page number milestone is within another character style (this does not apply to paragraph styles), such as title or name person, etc., the element representing the surrounding character style will be repeated after the milestone. This will result in invalid markup that looks something like this:
བོད་སྐད་དུ། <title lang="tib" level="m">འཇམ་དཔལ་ཡེ་ཤེས་སེམས་དཔའི་དོན་དམ་པའི་
མཚན་ཡང་དག་པར་བརྗོད་<milestone unit="line" n="1a.2"/><title level="m">པ</title>༑
Here, the open <title> tag is erroneously repeated after the <milestone> tag. This causes the document to be invalid and in oXygen you will received the following error message:
The element type "title" must be terminated by the matching end-tag "</title>".
Double clicking on the error message at the bottom of the oXygen window will take you to the vicinity of the fault. The second open <title> tag needs to be deleted for the document to validate. The correct markup would look as follows:
བོད་སྐད་དུ། <title lang="tib" level="m">འཇམ་དཔལ་ཡེ་ཤེས་སེམས་དཔའི་དོན་དམ་པའི་
མཚན་ཡང་དག་པར་བརྗོད་<milestone unit="line" n="1a.2"/>པ</title>༑