Old Instructions On Converting Word To Xml

THL Toolbox > Essays > Creating an XML Essay for THDL > Old Instructions on Converting Word to XML

Old Instructions on Converting Word to XML

Preliminary Installation of Conversion Routine

The conversion routine is written in Visual Basic for Applications and is included within a word document, called “Text to XML Converter.doc”. It will only work on Windows machines. It is necessary to obtain this file and a few other supplementary ones for the conversion to run successfully. To set up one’s machine properly for conversion, do the following:

  1. Download the text2XML.zip package from THL website.
  2. Upzip it.
  3. Place the file called teiHeader.dat in the C:xml folder that is already on your hard drive from having set up Morphon.
  4. Save the other documents in a convenient folder anywhere on your hard drive. These other files are:
  5. external link: TextToXMLConverter.doc – the actual document that does the converting
  6. external link: metatable.doc – contains the metadata table that must be placed at the very topo of any document before conversion.
  7. THL Word to XML Manual.doc – the instructions for how to apply the styles to a particular document.

With these documents in place, one can next proceed to actually convert a Word document into an XML document.

Conversion of Word Document to XML

Once the above documents are in hand and in their proper places, you can proceed with the process of actually converting a Microsoft Word document into an XML file. This, of course, assumes that the text to be marked up has already been entered into Word, proofed and so forth. From that point, the conversion process is as follows:

  1. Using the THL “Word to XML” style template, styles should be applied to the appropriate parts of the document, as detailed in our Word to XML Conversion page.
  2. The whole essay is then copied and pasted into the THL TextToXMLConverter.doc.
  3. Make sure macros are enabled by going to Tools (menu), choosing Macro, and then Security. (It should be set to medium or low. If it is set to high, you must change the setting. Close and reopen the document).
  4. If the document does not already have the metadata table at the top, add this at the very top of the document and fill it out the appropriate fields.
  5. Press Alt-Shift i. This will start the macro that reviews all italics outside of headers. The resulting window will show an italicized word and give a drop down menu of types. Choose the appropriate type and press enter. The next italicized phrase will appear. In this way, it will proceed through the document assigning specific character styles to replace italics that are generically used for titles, foreign words, and so forth.
  6. Press Ctrl-Alt c. This will start the conversion routine. When it is finished, an alert window will pop-up telling you to cut and paste the document into your XML editor.
  7. In Morphon, choose File, New. In the file-chooser that pops up, press the “New” button.
  8. A Input Box will appear with the label “Enter a root element”. To create a blank document, enter anything, such as “Blank”, “temp”, “bogus”.
  9. In the resulting document, choose View, Source. (Ctrl u)
  10. Delete everything in the source view.
  11. Cut an paste the converted XML document from Word into Morphon.
  12. Click on the green check-mark icon in the taskbar below the menus to validate the document
  13. Because the conversion routine is imperfect, validation errors will inevitably occur. Only one error appears at a time. Check the line number. Go to that place in the document and fix the error. Check validation again (green check-mark). Repeat as necessary.
  14. When you receive the message, “Your document is valid!”. Choose View, Inline Tags (Ctrl j).
  15. You can then edit the XML document in the editor as necessary. It is advisable to check the markup of the conversion program, even after it validates, to make sure it conforms with THL standards.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library