Contributor(s): David Germano, Nathaniel Grove, Steven Weinberger.
This document describes the process for creating an XML essay for display in the Tibetan & Himalayan Library. Such an essay can be either a Scholar's essay on a subject in the field or a technical essay such as the present document. In either case, the basic process is the same and, at least for the time being, the styles used for displaying the two are the same. In the latest version of the converter, the process has been significantly simplified.
The process of creating an XML document for THL has been simplified by using Microsoft Word and glossary tables. One first extracts all glossary items (personal names, place names, text titles, terms, etc.) from the essay and places them with their associated information (phonetic, other language translations, dates, item type, etc.) in the glossary table. The converter uses the glossary table to search the essay and apply the appropriate Word style to each occurrence of the item. It then goes through and converts the Word document to XML based on the Word styles applied. Finally, it applies the glossary information to the essay creating correspondences between each occurrence of an item and its glossary entry and also including supplementary information (Wylie for terms, dates for people, etc.) on the first occurrence of that term. The results are two XML files one for the essay and one for the glossary.
The converter will also convert an essay without a glossary by simply converting Word styles applied in the essay itself into XML. It will also convert a Tibetan text (as long as it is entered in Unicode) into XML with or without a glossary.
The whole process consists of the following steps:
For a description of how the converter works, see the section below.
The conversion routine is written in Visual Basic for Applications and is included within a word document, called “thl-word2xml-conv-v1.3.doc”. It will only work on Windows machines. It is necessary to obtain this file and a few other supplementary ones for the conversion to run successfully. To set up one’s machine properly for conversion, do the following (This is described for version 1.3. The version number may change if the converter is updated.):
At this point, you should be ready to converter a document created with THL Word styles, as follows. The following instructions are for the new converter (v1.3 and above). For previous versions of the converter, see Old Instructions on Converting Word To XML.
If the resulting XML document is not valid, the transformation to apply the glossary entry information will not work. In such cases, an error statement will appear in the MS-DOS window that monitors the XSLT transformation. One will then have to find the XML document output by the converter. This will be located in the /outdocs/ folder within the converter under the name originally specified. The last part of the conversion process, which applies the glossary information to the essay using XSLT, has to be redone. The whole process is:
Each of these steps are described in further detail in the linked sections.
Note: These instructions are written for PCs running Windows (preferably XP but it may work on earlier versions) only.
First you create the Word document in a THL template with the appropriate header and structural styles applied to it. Then you create a separate document with the glossary table in it. Then the converter is a stand-alone third document which contains a visual basic macro and also uses a Java virtual machine to do things like put in the information after the first occurrence of a world and sort the glossary in Tibetan sort order – but it runs this through a Word window, though you must have Java installed on your machine. You press Alt + C, and it asks you to specify the essay document, and you choose it from your hard drive using a “choose file” dialog box. Then it asks for your initial, and whether it has a glossary file. If you have one, it asks you choose the glossary file from your hard drive using a “choose file” dialog box.
It looks at the glossary and starts with the shortest term (so will start with sangs rather than sangs rgyas); otherwise for terms of the same length, it simply starts with the first term it encounters in terms of its location within the file. Then it searches through the essay and finds all occurrences of that text string. Then it uses the “type” of term to determine what Word character style to apply to that term in Word. Once it has done that for each term in the Word glossary, it then converts the Word styles into XML.