Contributor(s):
While one can create a new XML document from a template in the program itself, in the case of Tibetan texts it is often necessary to go through several processes to convert a .rtf or text document into an XML file. The instructions that follow detail how to get a Tibetan text that is in a Word document in Tibetan Machine Web converted into a valid XML file using THDL mark-up scheme. These instructions are done in terms of our primary, recommended editor—XMLmind. However, the process would be similar with most XML editors. This process has three parts: 1) converting the Tibetan script to Wylie, 2) adding lineation mark-up and removing paragraph breaks, and 3) creating the XML file.
One needs to have the Jskad java program installed on one’s machine or it can be run on the web.
For accurate referencing of the digital Tibetan text, there needs to be lineation. However, in a digital context where pages can scroll almost infinitely and line length (i.e., screen width) is variable, determining a standard measurement for a line is difficult. We have decided to consider a line to be a shad-delimited line with 100 shad-delimited lines per digital page. A macro found in Lineator.doc has been created to automatically insert this mark-up prior to pasting the Wylie into an XML document.
To mark-up and number the shad-delimited lines in a fixed manner, the <seg> element has been chosen. It must have a type attribute set to “shad”. The TEI guidelines says, “The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing.” These elements provide the most flexibility, because they can have as their children any of the following: sentence-level (<s>) elements, phrase level elements (<cl>, <phr>), quotations (<q>, <quote>), and other <seg> elements (of different type). The <seg> elements will also be numbered using their n attribute. The value of the n attribute will be “<hi rend="weak">page</hi>.line”, a page being 100 lines long (i.e., containing lines 1 to 100). Thus, the 253rd shad delimited line would have <seg n= “2.53” type= “shad”> … </seg> for its mark-up.
The Lineator.doc Word document contains a macro for automatically marking up straight Wylie input with the <seg type=“shad”></seg> element. It does so based on punctuation including whitespace. Each shad-delimited line begins with the first letter of that line and ends with the whole string of punctuation that follows it until the first letter of the next line.
To use the Lineator macro, follow these steps:
This file can be saved as a separate file or one can proceed directly to step #3.
The creation of an XML document once the lineation requires a few step. However, each step is relatively straight forward. The following will be described in terms of XMLmind.