Using Word Styles For Thl Markup

THL Toolbox > Essays > Using Word Styles for THL Markup

Using Word Styles for THL Markup

Contributor(s): David Germano, Nathaniel Grove, Steve Weinberger.

Introduction

When texts have been input into the computer, it is possible to further process them in order to develop enhanced facilities for searching, display and navigation. Doing so allows for powerful functions such as the following:

  • Ability to search only on certain structural parts of the texts, such as colophons, or only on certain types of content for place names
  • Ability to show a hotlinked outline of texts for quick navigation
  • Ability to show the text in attractive formatting that renders structure content far more clear than traditional formatting

Our markup of e-texts focuses on the following general categories:

  • Structural divisions of the text: this marks structural divisions of texts, such as the colophon section, chapter divisions, outline divisions and so forth.
  • Text format: this marks types of textual strings, such as verse, prose, citations and so forth.
  • Thematic content: this marks types of content, such as place names, personal names, and so forth.
  • Coordination of separate components: this is used to coordinate various objects, such as multiple editions of a given text, a digital text to its original printed sources, a text to a commentary on it, and so forth.
  • Critical edition apparatus: this is used to mark variant readings for input manuscripts and so forth when creating a critical edition from multiple editions of the text.

In developing a process that would facilitate scholarly collaboration, we have discovered that the best method for initially marking up a text is to first enter it into a word-processor using specific styles and then convert the document into XML. We have devised a scheme using Microsoft's Word program and its Visual Basic for Applications macro language, though a similar process could be created for other word processors that have macro capabilities. To insure accurate conversion of the Word document into the proper, valid XML, THL's director, David Germano, has devised a set of Word styles for the conversion process.

About XML Markup

Full markup of a text is extremely time consuming. Thus its important to choose the right level of markup to perform in specific contexts in terms of one's goals and resources. “Structural markup” outlines basic markup that can be done quickly by someone with limited expertise to make a text available in XML with simple navigation. “Format markup” then takes considerably more time, but creates a nicely formatted text with verse, prose and so forth all rendered differently. The work is slower since it requires careful reading to mark verse from prose, and so forth. “Thematic/full markup” then is the most time consuming, and requires the most expertise, since the editor must be able to understand the meaning of the text to recognize place names, personal names and so forth. The markup of a text need not be done complete, but rather can be done in stages. We generally recommend considering three distinct phases of markup that involve increasing amounts of times conjoined with increasing extents of functionality:

  1. Structural markup of basic text divisions including chapters but not including topical outlines (sa bcad); if page numbers from the original manuscript are already there, then mark them up and otherwise ignore them; digital pagination/lineation
  2. Format markup of outlines along with verse/prose, citations, lists, text titles, annotations
  3. Thematic/full markup including personal names, place names, page references from print manuscripts and so forth.

XML is the necessary language to use for such markup, but most scholars are unfamiliar with it and editing tools remain rather cumbersome. We have thus created a system for using Microsoft word to easily markup a text which can then be converted automatically into XML. For the more adventurous, we have also modified a free XML editor named XML Mind so that scholars can directly work with XML without having to buy an expensive editor and/or suffer through cumbersome and tedious markup procedures.

To understand XML, it is useful to consider the “Styles” feature of Microsoft Word, which can be understood as a very weak intermediary move in the direction of XML. Many unsophisticated users of Word processors use the format options to manually shape each part of their texts. Thus if they want a citation indented, they manually tab over, or change the settings in the Format menu options. Then the manually switch the formatting back again to go to a standard paragraph style. In the same way they manually reformat text if they want to show verse organized in lines and so forth. By working in this manual fashion, essentially all they have done is make the text appear manually appear as they desire. However, that presentation then is completely fixed – to change how, for example, citations appear, or bibliographies, they have to go through and manually change every single occurrence of citations (or at the very least do global search and replaces). Because they have not done anything towards identifying typical types of text they use, there is no way to manipulate those types of text within their documents.

Word Styles

Styles is a feature of Microsoft Word and other powerful word processors that addresses these limitations. Word offers a range of default styles, but also allows the user to define and utilize their own styles. Each style has the following three components:

  1. Name indicating its intent (such as “Verse” or “Paragraph” or “Citation”)
  2. 1-3 letter abbreviation for easy application from the keyboard (such as “P” for Paragraph, and so on)
  3. Formatting specifications (such as spacing, indents, font, bold/italic, color, etc.)

Paragraph styles are intended for use in blocks of text that are typically marked by a carriage return at the end, while character styles format shorter strings of texts within paragraph styles. Thus a block of texts might be marked with a paragraph style “citation,” but within that a text title might be marked with a character style “text title.” By using such styles, users can quickly format parts of texts as they like by invoking the relevant style, always a faster process than manually making formatting changes. In addition, because all parts of the text are marked up with relevant styles, the appearance of parts of the texts can be quickly changed by simply changing the formatting definition of the relevant style. Thus if one decides one wants the standard “paragraph” to now appear as double spaced instead of single spaced, one simply redefines the paragraph style as double spaced. To start using styles, one should first look at the type of text one works on within Word – whether one's own essays, or input literature, or whatever – and take stock of the types of formatted text one typically uses, both as whole blocks of texts and as individual text strings. Write these up with names, and definitions of the corresponding visual format. That list constitutes essentially your Word Styles. To create, go to the Format Menu, and choose Styles. Then choose “new” and give your first style a name and use the Format button to specify what it should look like. By following the name with a comma and 1-3 letter abbreviation, you can then easily apply from the keyboard.

Styles can be specified manually by having the formatting toolbar (see below) open and clicking on the arrow next to the style names to get a drop down list of all available style names. One scrolls down to the desired style and clicks to apply it. It will apply to whatever text was selected, or wherever the cursor was located, before the style was chosen. One can also manually select a style by going to the Format menu, choosing the desired style from the list to the upper left of the Format dialog box, and clicking on apply. However, by far the most rapid manner to apply styles is to use the keyboard. Styles can be easily input from the keyboard without using the menu. To assign you own keyboard shortcut to the choice of styles, in Word choose Tools: Customize: Keyboard. Then under Categories choose Format; then under Commands select Style. Then put the mouse on the “press new shortcut key” box and click, so that the cursor appears there. Then hit the keys you want to use as a shortcut, such as Ctrl-Alt-S, or whatever. Then choose “assign” and hit OK, OK to close the dialog boxes. Then make sure you have your formatting toolbar visible at the top of Word. Go to View: Toolbars, and make sure Formatting is checked. You will then see a box which gives the name of the current style, followed by a box with the font name, followed by a box with the font size, and then buttons for bold (B), Italics (I) and so forth. When you use your Styles shortcut key, the box with the style name will turn blue and whatever you next type will be interpreted as the selection of a new choice of styles to be acted upon when you hit return. To make this easy, we have given each style a 2-3 letter abbreviation.

Thus if your keyboard shortcut is Ctrl-Alt-S, you hit Ctrl-Alt-S, H1, and then RETURN, and the text where the cursor was located will be changed to the H1 (heading 1) style. By following this practice, you can easily change styles and markup a text in a quick and efficient manner. If instead you go up and manually select your styles, you will waste considerable time and find this to be a cumbersome system. For more information on THL specific styles, see Getting Started Using Microsoft Word.

Styles and XML

This simple use of styles means that you have taken the first step to a more formal way of marking up your word processing texts that reflects your own characteristic use of texts and your intellectual assessment of its components. You have identified certain features of your texts, given those names, defined their format, and then marked up your text in accordance with it. The transition on the Web from HTML to XML in terms of marking up pages involves a somewhat analogous transition. HTML essentially involves only the specification of visual formatting – it indicates what each part of a Web page should appear like, such as specifying this text string is indented, this string is bold, this one is green, this one flashes and so forth. However, HTML is essentially no better than the manual formatting of in Word of a document – no intellectual inventory of types of textual elements has been done, and the individual parts of the texts are “hard” formatted with no flexibility to use them in complex ways. XML, in contrast, is based upon a prior intellectual assessment of what one's needs and goals in dealing with a given body of texts. Based upon this assessment, one defines a series of elements, or tags, which allow one to identify the relevant components of these texts which one wants to work with. Thus one might say “paragraphs,” “citations,” “verse,” “text titles,” “author names,” and so forth are all elements that are necessary. One then uses these elements to markup the text – marking author names with the “author names” tag - and so forth. By doing this, one can then manipulate these individual elements in powerful ways:

  • The text can be displayed in various ways that can be easily changed, by mapping the list of elements to various visual formats (these maps are called style sheets), such that single text can be displayed in diverse ways to highlight different elements for different audiences
  • Powerful analytical searching can be done in relationship to the elements, so that one could search only on citations, or only for people's names, or for a given person's names within 10 words of a given place name, and so forth.

Thus XML can be seen, again in a very simplified fashion, as analogous to the use of styles in Word. However, in fact XML is far more powerful since it a robust markup languages that allows for creative and flexible systems of markup far surpassing such a simple system as Word styles. One of the most important aspects of this is that elements are not just mapped to visual formatting, but also have rules specified for their use as well as additional nuances specified by “attributes.” An example of such rules might be that a Date can only be entered in the format of YYYY-MM-DD, or this type of element can only be located within another type of element. Attributes involve, for example, a LANGUAGE element, which can be further specified as type=Tibetan, or type=English, or type=Nepali, such that a given term has not only be marked as a foreign language, but also the type of language is indicated. Or for a PLACE NAME element, one might create attributes that specify it is a government name, a colloquial name, a foreign name, and so forth. In these ways one can build powerful intellectual systems and models in XML by defining the elements, rules and attributes that are needed for one's task, and then using them to markup text as well as build complex data structures referred to as XML databases. Thus XML not only serves as a text markup language, but can also be used to construct data repositories for all sorts of types of data – texts, images, videos, tables, etc. Hence it is useful to differentiate between XML text markup and XML databases. The collection of defined elements, rules and attributes that one constructs for one's work are called DTDs (document type definitions) or schemas. This impressive flexibility and power of XML, however, comes at a cost. Firstly one needs to expend intellectual energy in defining the usages one wants to make of XML, and secondly, the tools to work in XML, apart from still being in early phases of development, by necessity require more user investment in specifying the application of the elements and attributes. One can infinitely mark up a given text – such as labeling ever grammatical part of speech, every thematic type of word – but that means human labor is necessary to do so. In practice, one has to strike a reasonable balance between what is theoretically possible, and what is actually possible given one's resources and needs.

Our THL Word-to-XML system is a strategy we offer to users who want to access the power of XML, but don't have the time or energy to learn how to do full featured XML editing. Taking advantage of Word's Styles feature, the system allows users to manipulate paragraph styles and character styles, along with other formatting, to markup their Word documents in ways that can then be automatically exported to XML. While it will clearly not allow one to tap into much of the power of XML, such as rules, attributes and so forth, it can enable fairly powerful XML markup with a minimum of training. In essence, one uses the THL Word Styles to mark up one's text. Once finished, the THL Word to XML Conversion program (under development) will map each style/formatting feature in Word to a corresponding unique XML tag. As long as you don't change the names of the THL styles, you can make the styles appear however you want while editing – the conversion program will pay no attention to anything but style names, and how it appears in the final presentation on the Web will relate to XML style sheets, not how you view it in Word. Thus if you prefer to see Place Names as red, you can do so; the conversion program will simply see the “place name” style, not whether you specified it appears as red or green fonts.

At present THL has created a single set of styles, which can be downloaded from the section of Getting Started Using Microsoft Word. In the future we will create three sets of styles for different purposes:

  1. Full THL Styles: this contains all of the Essay Styles and Tibetan Styles integrated together. For those using this system in both contexts this may be more convenient; for others using only one or the other, they may prefer to not have the clutter of the other. In reality, most of the two sets of styles overlap with each other, though the long list of “divisions” (see below) used in Tibetan Styles are not found in Essay Styles which is the chief gain in clutter reduction.
  2. Essay Styles: these styles are intended to serve the needs of those writing essays of any type within THL.
  3. Tibetan Styles: these styles are intended to serve the needs of those marking up e-versions of classical Tibetan literature.

We anticipate that these styles will need to be expanded as we accumulate experience from diverse user needs. Thus we encourage everyone prior to using them to first take stock of their needs and contact us if they would like to see additional styles added. In order to do so, one should do the following three assessments of the compositional or markup text at hand:

  1. Consider the basic structural divisions of the text: is there a front? a body? a back? are their chapters? an internal nested outline? Do a list of them, and verify that the THL System has these elements covered
  2. Consider the basic types of text formats contained within the document – verse vs. prose? citations? lists? etc. Do a list of them, and verify that the THL System has these elements covered
  3. Consider the thematic types of words in the text that you may want to mark up so as to later display these terms distinctively (such as showing place names in green, etc.), and/or be able to selectively search (find all place name=X within 20 words of personal name=Y), etc.

One can easily add more styles to suit one's needs, but we would appreciate you contacting us if you decide you need more styles. That way we can continue to refine our standard Styles, and also it will assure you that we will factor your needs into the conversion program. However, we do anticipate that for thematic types of words in paticular, i.e. character styles, that people will have needs unique to their own projects. Thus there will be some need to customize the conversion program in accordance with such projects. In this case it is important to contact us first to make sure we are able to do that, or to give you instructions for your own technical people to make the adaptations. In addition, in some cases it might be that one's needs simply are too complex to utilize the Word system, and that its necessary to work directly in a XML editor, or to do a first pass in Word, and a second pass in a XML editor.

Simplified Markup Instructions for Word

The following provides a quick manual for reference in following THL's Word-XML Markup procedure. It largely relies upon the “Styles” feature of Microsoft Word, which also helps introduce people to XML since it can be understood as a very weak intermediary move in the direction of XML. Each type of component of a text is explained, and the corresponding Word style/formatting feature is specified. By strictly following these instructions, we will be able to subsequently automatically transform your word document into a valid XML document for publication within THL allowing powerful flexibility and searching.

Header Data

The top of each document should have the following table filled out at the top. In curly braces in each field are descriptions of the information that should go there. In these descriptions, “essay” refers to the the body of the Word document that the table precedes:

  • external link: Metadata Table: Metadata table to be included at the beginning of each Word document for conversion.

The information for these fields is entered in the blank cell immediately to the right of the label. Most of the fields are self-evident. The use of “Title Original” and “Original Language” are for translations. “Title Original” is for the Romanized transliteration of the original title. All languages are referred to by their first three letter, all in lower case. “Language of text” refers to the language of the document being marked up. In most cases, this will be “eng” for English. “Original Language” refers to the original language of the printed document, if it is a translation or a digitization of a print document.

Structural Divisions

Generally we divide Tibetan texts into three overarching sections: Front, Body and Back. Each is marked with a Heading 1 style:

  • Front includes all the prefatory materials (home, title, etc.)
  • Body is the main part of the text which typically consists of chapters
  • Back is the concluding material such as colophons, etc.

These three sections then in turn are divided into subdivisions which we term chapter-level elements after the most pervasive type of such subdivisions, namely the subdivision of the Body into chapters. See below for a full list of such chapter-level elements.

The one difficult issue is that many Tibetan texts also use internal topical outlines (sa bcad) to create a nested series of divisions and subdivisions that can have 10, 20 or more layers. These outlines typically are limited to the Body, but we are unsure if there may be exceptions that might span the Front and Back as well. If there are such exceptions that would constitute a problem and we ask that people alert us to such texts. The problem with outlines is that at times a single text has a single outline which is enumerated in such a way that it spans individual chapters; in other texts, the outline is self-contained within each chapter, such that a new outline begins at the beginning of each other and completes at that chapter's end. When the outline spans multiple chapters, it means the text has two competing sets of structural divisions – one into chapters, and one into a topical outline-driven structure.

Our practice is to use header styles in Word to represent the structure divisions of a Tibetan text. At present we have 25 levels of headers, which we can expand if necessary. The top layer, Header 1, is used to mark Front, Body and Back. In the Front, Body and Back, chapter-level elements are Header 2. If the text has a topical outline that is fully contained within individual chapters, then it starts with Header 3 onwards. However, if the text has a single topical outline that spans multiple chapters, such that parts of the outline begin in one chapter but continue in another chapter, then while the Body is still Header 1, its nested Header 2 is instead used for the top level of the topical outline. In this case, then, the chapters are marked instead with the Chapter (“c”) style and not with heading styles. If the text at hand has a topical dictionary-like structure where a term is named, and then it is discussed in a paragraph (such as in material medica), this list should be treated as part of a topical outline with each term a separate header.

All Headers are inserted text. The actual text specifying the name of a chapter, or topical outline is NOT marked with these heading styles, but rather one inserts a blank line at the relevant structural division, and types out the name of the division. The names of the divisions should be done using the following conventions in Tibetan. There is also a traditional way of talking of texts consisting of a number of “units” (bam po), which essentially is just a kind of length metric that counts of stanzas with a set number of stanzas consisting one bam po. However its not clear how much this is actually marked within a text, as opposed to simply being used within catalogs as a metric of length. We have thus not at present accounted for this and are watching out for occurrences.

Front (klad): header 1

The front typically consists of a series of elements that are all at the same level. Markup should simply type out the Tibetan name of the element:

Title page (mtshan byang):

Title line (mtshan):

Homage (mchod brjod):

Invocation (gsol 'debs pa):

Praise (bstod pa):

Statement of intent (rtsom par dam bca' ba):

Untitled introduction (spyi'i gleng gzhi):

Ordinary introductory scene (thun mong gleng gzhi): if the gleng gzhi is chapter one, then it is should be classified instead as chapter 1 of the body.

Extraordinary introductory scene (thun mong ma yin pa'i gleng gzhi):

Outline (sa bcad):

Possible additions to consider:

  • dgos pa'i dgos pa, etc.

Body (gzhung): header 1

The Body typically consists of Chapters (le'u), which should be labeled in markup with the chapter title given at the end of the chapter; secondly the title given at the beginning of the chapter if there is none at the end; and thirdly with the simple number of the chapter (le'u dang po) if no title is specified. Chapters themselves can have sub-components aside from topical outlines:

Chapter title (le'u mtshan):

Chapter homage (le'u re re'i mgo'i mchod brjod):

Chapter colophon (le'u re re'i mjug gi smon lam):

These should all be Header 3 styles, and then any sa bcad would start with Header 4. If the sa bcad spans multiple chapters and thus chapters are marked up with the “Ch” style, these sub-chapter divisions should be marked up with the Chapter Element style (“che”).

In addition, there may be section divisions (chings) which include multiple chapters. These are marked as Header 2 if there is no overarching topical outline that spans them; if there is, then the Section Division (“sd”) style is used.

There may also be interstitial sections (bar skabs kyi tshig su bcad pa) which are unnumbered sections located between sequentially numbered chapters. If the chapters are marked up as heading styles, then these are in addition at the same level (most typically “header 3”). If the chapters are not marked with heading styles, then this is marked as Interstitial sections (“sd”).

Back (mjug): header 1

The front typically consists of a series of elements that are all at the same level. Markup should simply type out the Tibetan name of the element:

Closing section (mjug gi don):

Author’s colophon (mdzad pa po'i byang):

Redactor’s colophon (sdud pa po'i byang):

Translator’s colophon ('gyur byang): typically this includes the whole section on who translated and revised.

Lineage transmission (lung gi brgyud pa):

Treasure colophon (gter byang):

Reviser’s colophon ('gyur bcos):

Editor’s colophon (sgrig pa po'i mchan):

Scribal colophon (bri ba po'i byang):

Printing colophon (par byang): this is the carver’s colophon. who gave the money to carve the block print, who carved it, and dedication of merit (the next one).

Concluding prayer (par byang smon lam):

Closing invocation (shes brjod=bkra shis pa'i tshig brjod pa): like sarva mangalam/ (can come in many places, usually after the section it is connected to.). These also include final dharanis at the end of the text, which function to remind the Buddha of a former “deal” that by reciting these words the concordant result would ensue.

Annotator's colophon:

Instructional colophon (gdams gtad):

Undetermined colophon (mdzad byang ma nges pa):

Proofreader final notation: this is a sort notation “zhus gcig” (“checked once”) which appears usually after everything else and is written by a final proofreader.

Text Formats

The following are mostly marked by paragraph styles, and all involve blocks of texts.

Prose paragraphs: paragraph (“p”: indented first line and open line preceding). This is the standard style for prose. In Tibetan, paragraphs are not so clearly marked, but we suggest conservatively inserting paragraphs to make the text more readable. Make paragraph breaks based upon general thematic shifts that typically take place with a terminative marker (repeating the last letter of the final word of a shad-delimited line, and adding a naro to it – rdzogs so, bshad do, etc.). A double shad can also indicate such breaks but are too infrequent to be of help. Introducing paragraphs helps make the text far more easy to read and does very little violence to the text.

Verse lines: verse1 (“v1”; block indented with open line preceding) and verse2 (“v2”; same but without open line preceding). v1 is used for the opening line of a stanza since it introduces an extra space above it, while all other lines should be marked as v2.

Citation prose: citation prose (“cp”: block indented paragraph formatting).

Prose cited within a citation:

Citation verse: citation verse (“cv1”: block indented verse formatting with open line preceding) and citation verse2 (“cv2”: same but with no line preceding). cv1 is used for the opening line of a stanza since it introduces an extra space above it, while all other lines should be marked as cv2.

Verse cited within a citation: citation verse nested 1 and 2 (“cvn1” and “cvn2” – these are simply indented further than citation verse, with cvn1 for first lines of stanzas and cvn2 for all other lines in stanzas). Bulleted list in a citation list: citation list bullet (“clb”: indented and with automated bullets).

Numbered list in a citation: citation list number (“cln”: indented and with automated sequential numbers ).

Bulleted list: list bullet (“lb”).

Numbered list: list number (“ln”).

Tables: use Word tables, and the normal (“no”) style for the text inside the table.

Questions: enclose in question marks, ? … ?

Direct speech/conversation: enclose in angle brackets, <>. This is used for speech spoken by someone within the text.

Thoughts: enclose in (*) signs, (*)…(*).

Narration: enclose in # signs, #…#. Narration can contain thoughts, conversation, prose, verse and so forth.

Some general guidelines to keep in mind:

  • When you insert carriage returns between lines, such as between two lines of verse, or two paragraphs, the white space after a shad (represented by an underscore) should be kept with the preceding line rather than the following line. Thus you will have /_, then carriage return, and then the next line begins (either rgyud… or /rgyud depending on whether the next line has a beginning shad.

Thematic Content

Whereas text formats are mostly marked with paragraph styles, thematic content is marked with character styles so that the underlying text format – verse, citation, etc. – is unaffected. The following are standard types, but we expect this is the area where probably there will be the most interest in custom designed styles to suit specific types of thematic markup one might want to do.

Name personal human: nph (blue)

Name personal Buddhist deity: npb (blue)

Name personal other (i.e., deities): npo

Conversation: Mark all conversation or citations of speech as “speech,” which is usually prose (sp), but if spontaneous poetry could be verse (sv1, sv2). In addition, mark the speaker’s name as above (speaker generic, sg; speaker buddhist deity, sb, speaker human, sh, speaker other, so). Speaker generic is used when you don’t know how to classify the speaker, while speaker other is used when you know the speaker is not human or a Buddhist deity, i.e. a local spirit, etc.

Speakers: Unfortunately, speakers can be identified with personal names, or with pronouns. Hence if we simply used speakers, it would mean that the term in question would not be clearly marked as to whether it is a personal name, and if so, what type of personal name. Thus we have used the cumbersome fallback of using several styles for different types of speakers (all display as brown italic text):

  • Speaker Generic: sg (Used when you don’t know how to classify the speaker.)
  • Speaker Buddhist Deity: sb
  • Speaker Human: sh
  • Speaker Others: so (Used when you know the speaker is not human or a Buddhist deity, i.e. a local spirit, etc.)

Place name: pn (green).

Name of Organization: nor. For a monastery or other religious institution with physical setting (i.e. buildings), norm. nor is also used to mark other forms of groups, such as sects.

Name of Ethnicity: noe. This includes Chinese, Khampas, etc.

Text title: tt (underlined). For a Sanskrit original text (regardless of whether a Tibetan translation is being referred to) (tts), for a Tibetan original text (ttt). If the text is claimed to be a Tibetan translation of a Sanskrit text, follow the claim rather than your own judgment.

Date: dt (plum). If both Tibetan and Gregorian date are given, mark the whole phrase as dt.

Monuments: nm (gray).

Emphasis strong: es (bold).

Emphasis weak: ew (italic).

Topical outline: to (orange).

Languages

When you want to identify the language of a word or string of words that is other than the bulk of the text, you can use the following character styles, all of which appear in pink and have three letter abbreviations consisting of the first three letters of a word (all foreign languages display as pink text:

Chinese: chi

Japanese: jap

Korean: kor

Nepali: nep

Pali: pal

Sanskrit: san

Tibetan: tib

More languages can be added following the same principle.

Coordination of Separate Components

This section is focused on the coordination of separate components of texts, such as coordinating multiple editions of a text, or page numbers from a print edition, and so forth.

Page numbers from print editions: pages (“pg”: violet).

Annotations (mchan): annotations (“an”: italic and 9 point). The annotation should be placed following the syllable to which it is attached in the original text. Typically annotations in a Tibetan text are connected via dots to a specific part of the text and written in a smaller character size. Sometimes in xylographs you see the name of the carver for each section one single carver did - it says what section the person carved, and his name. This is usually at the very bottom of the page. It should simply be treated as an annotation and attached to the last syllable of the page.

Digital pagination/lineation: pagination and lineation of the new digital text is created using an automated program – the “THL Paginator.” This marks each shad-delimited line as a new line, which is then numbered from 1 to 100. Each 100 lines are then numbered as separate “pages,” which are numbered sequentially from 1 onwards.

Footnotes: footnotes can be used as desired in Word. These are converted to <note> elements in the XML inserted at the point in the text where the footnote reference number occurs. Through stylesheets these notes can be displayed as either footnotes or endnotes. Thematic or descriptive mark-up of names, places, texts, and so forth may be used within the footnote text. To enter a footnote in Word at the cursor, press Ctrl+Alt+F or use the Insert menu > Reference > Footnote.

Critical Editions

Variant readings are recorded in the following manners. The issue of overlap will probably become problematic and we will simply be learning from experience how to refine this system as we proceed.

  1. Put brackets around the section of the text that you are recording a variant for, with the left bracket following the preceding syllable with an intervening space
  2. Insert a footnote after the right bracket without any intervening space; insert space after footnote before next syllable
  3. In footnote, write TK chos - default is first letters indicate sigla, and after space is variant
  4. If multiple editions give the same reading, then separate the sigla with hyphens: TK-TB-AB chos.
  5. If different editions give different readings for the same term, then the terms are separated by a semi-colon – TK-TB chos; DG bcos
  6. If a given edition simply lacks the bracketed word(s), then you give the sigla and say “absent” (check what convention is) – TK absent
  7. If a given edition inserts an extra term, then you simply insert two brackets containing nothing at the insertion point, and the footnote gives sigla and the term: TK chos.
  8. To give a rationale for the decision as to what the normative reading is, in the footnote you end the variant reading with a period, and then precede the comment with “R: ”. Example: TK chos R: The chos appears to be a late correction which….
  9. It is necessary to avoid overlap of variant readings, so in that case one has to choose a long enough string to give the variant for; this may still be a problem.
  10. If a long section is missing, such as an entire line, use curly brackets {} to enclose entire lines in this case.

Troubleshooting Word Styles

There are defects in how styles work in Word. Sometimes when you combine documents, or text from different documents, or when Word “recovers/repairs” a document after it crashes, you will find that styles have become combined together so that you now have a style with two different names joined together. Worse, you may find that a style isn’t rendering correctly, such as a bulleted style is no longer appearing with bullets, or when you try to apply a style, it doesn’t seem to be applied. Another frequent and associated problem is a second style will appear which has the same name as one of your standard styles, but has “char char” after the name (presumably standing for “character”). If you delete the “char char” form of the style, the main style will be deleted as well. These “char char” styles are apparently character style type versions of the corresponding paragraph style type style which Word has automatically generated for unknown reasons. If you delete all your text these styles appear to disappear – but the problem is of course that you may have a long text with complex formatting and you don’t want to lose it. Another odd thing about these styles is that they will not appear when you open up the Styles and Formatting box to the left hand side, nor in the standard drop down list of styles that appears in the formatting menu bar on top. Instead, you can only see in the list of styles that you will see if you do a global search and replace, and look under format: styles. You can also see them if you right click on the menu bar and turn off the formatting menu bar, and then use your style keyboard shortcut to choose a style – that will cause a popup style window to emerge, and you will see them there. So what do you do when your document’s styles go bad? Keep in mind that your Normal.dot template in Word will provide the same standard styles it contains for any new document you create, but once created, the document carries its own styles with it so that changes to the Normal.dot will not affect the document’s styles for better or for worse. Here are some strategies for dealing with corrupted styles.

Start over: as a first resort, or when all else fails, create a new empty document, and copy and paste your present document into it. If used as a first resort, even if it fails to fix the problem, it can be a good first step, after which you apply the following steps. Otherwise, if you apply the below steps, this is the final step that can solve the problem definitely, and in general is a good thing to do anyways after cleaning up your styles, so that you avoid future relapses.

Delete or rename: chose Format: Styles and Formatting, and a list of all styles will appear to the right of our document. At the bottom it has a dropdown box next to “Show:” and you should choose “Available Styles” to see the list of styles available in this document. Here you can right click on any style to either “delete” it (when a style you don’t want has crept into your document), or “modify” it (when a style has been renamed erroneously, or has otherwise been modified). This is the easiest way to deal with problems if it works. Caution – be careful of deleting styles you want to keep and which are already applied within the document. Once you delete the style, the text in that style will revert to the “normal” style. If the style is currently being used in your document, you can right click on that style, and choose “select all” – then in the formatting toolbar on the top of your document, you can type in or choose the style you want to change it to. You can then safely delete the style, and then later do a global search and replace to change the style you just chose back to the original style after you have restored a fresh copy of that style.

Use of Formatting toolbar: the formatting toolbar is the top toolbar that tells you what style is currently applied to a text, the font, and font size, among other things. By right clicking on the top menu bar, you can turn this toolbar of and on. One of the problems with the afore mentioned “char char” problem, is that these styles will often not appear in the “Format: Styles and Formatting” view for some reason. However if you turn if off, and then hit control+alt+s (or whatever keyboard shortcut you usually use to invoke the style menu), a popup box will appear that says “Style.” The “char char” styles will always appear there, and can be deleted from there.

Organizer: chose Tools: Templates and Add-Ins: Organizer, and you will get a view of styles in this document and styles in the normal.dot standard styles. Here you can do things such as copy over the normal styles into your present document, writing over the styles with the same names. You can also delete all the styles in the present document first, and then copy over the normal styles. This will often solve problems, though not always.

Global Search and Replace: Edit: Replace will give you the search and replace menu. If you click on “more” you can specify a style to search on, and a separate style to replace with, without specifying any text. Specify “all” and click on “replace all” and the replacement will be applied universally. Sometimes simply doing this for the same style (i.e. replacing list bullet with list bullet) can act to reapply a style and make it appear correctly after you have restored the style in question. For example, perhaps your list bullet style now has items appearing flush to the left rather than indented by .5 inches as they are supposed to; by replacing the list bullet style with a fresh copy, and then doing global search and replace on the list bullet style, the problem can be solved. In addition, sometimes, you may have to temporarily change text in a given style to some other style so that you can delete the first style to replace it with a fresh copy of the style; in that case, once the fresh style is installed, you can use global search and replace to replace the temporary style with the new style.

Further Reading on THL XML

This document describes the concept behind the use of Word to create XML documents for the THL. However, the process is more involved than just applying styles to a word document. The document needs to be converted and then edited within an XML editor. Finally, it must be posted to the THL website and linked to through a certain URL. Also, in order to proficiently edit the document in an XML editor, it is necessary to have some familiarity with the concepts of XML and the particulars of the THL DTD. The rudiments of this information is available in the following documents:

  1. How to create an XML essay : step-by-step instructions in how to create an XML essay for THL.
  2. XML Editors: A discussion of different XML Editors and how to install and use them.
  3. THL XML Guidelines: The guidelines for XML markup using the THL TEI-based DTD.
  4. Getting Started Using Microsoft Word: The instructions and download for using THL designed styles.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library