Xml Markup Manual For Thdl

THDL Toolbox > Developers' Zone > Web Development > XML Markup Manual

XML Markup Manual for THL

Contributor(s): Nathaniel Grove, Steven Weinberger

This document presents the guidelines for the XML markup of text in the Tibetan and Himalayan Library (THL). The first part contains a thematic presentation of tagging principles that discusses the different parts of an XML document (metadata and textual divisions) and various types of particular elements for marking up citations, links, images, etc. The second half of the document, XML Text Markup Detailed Manual, provides a list of all the elements used in the mark-up of THDL essays and text. This is not necessarily all the elements available, as our DTD, called xtib2, is a derivative of the TEI DTD. (For TEI documentation go to TEI's on-line documentation.) Presently, the description here focuses primarily on English or non-Tibetan essays and works. However, over the course of time this will be expanded to include the markup of Tibetan texts as well. While it is possible to markup any text document in XML using a simple text editor, it is advisable to use an XML editor, as these programs allow for one to validate the document against the DTD and thus insure that it is well-formed and properly marked up. The use of XML Editors, links to recommended editors, and customization packages are available from our XML Editors page. If you have not already installed an XML editor and set it up to work with the xtib2.dtd, please refer to that page. It will also tell you, how to create an initial XML document, either from scratch or using one of THDL's conversion routines.

Metadata

Every XML document needs to have some metadata, describing the source of the data that is the digital text. The Tibetan Text template provided with the XML mind customization already contains a metadata section with most of the necessary elements in it. Comments in the XML text describe the information that belongs in each field.

Minimum Required Metadata Info

The minimum requirements for metadata are the publishing information concerning the electronic text and the information concerning the original source document. Electronic publishing information needed is the following information:

  1. Text Title in English and Foreign Language if a translation
  2. Person responsible for transcription
  3. Person responsible for editing
  4. Person responsible for conversion & lineation
  5. Person responsible for mark-up
  6. Information concerning location in THDL: links and names for domain, portal, project, and home.
  7. Brief summary/description of the document.
  8. Information for Breadcrumbs. (1)

Other persons involved in the creation of the electronic document can be added as needed. Any such entry requires three pieces of information: responsibility (what was done), name (who did it), and date (when it was done). The date should be in the form of yyyy-mm-dd.

Required information concerning the original source document that was entered into the computer is:

  1. Title
  2. Author
  3. Editor
  4. Translator
  5. Pagination
  6. Place of Publication
  7. Publisher
  8. Date of Publication

Markup for Metadata

The markup for the metadata section is created using the Word to XML converter macro. All the information to be marked up goes in the table at the beginning of the Word document in the clearly defined fields to the right of each label. The detailed description of the metadata markup has yet to be written. To learn more about it, one is advised to create an XML document using the converter and inspect the result in Morphon or some other XML or text editor. A few helpful hints are included below.

Adding Other Persons to the Provenance

To add further people to the list of provenance personnel, highlight the last responsibility statement (respStmt) before the publication statement. Choose the edit tab at the bottom right and click Insert After. Choose “respStmt” and add in the information. The responsibility statement will come up with only a name element. Insert before this a responsibility element (resp) and after the text of the name insert a date element (date). All dates in the metadata section should be in the format of yyyy-mm-dd.

Structural Divisions

Major Divisions

Major structural divisions of a text are built into the TEI DTD through the children of the <text> element. These are <front>, <body>, and <back>. The only required one of these is the <body> element. A text does not necessarily need to have a front or back element. No ID or other attribute is required for these three major divisions as their names are distinct enough. In general, the front matter contains the title page, preface, dedication, and so forth; the body contains the chapters of the text (usually represented by <div1> elements), and the back includes afterwards, indices, colophons, and so forth. Subdivisions of any of these three are represented by "div" elements, described in the next section.

Subdivisions

Subsequent divisions of either the front, body, or back are recorded using the <div> elements. There are two types of <div> elements. The unnumbered one, <div>, and a series of ten numbered ones, <div0> through <div9>. In the mark-up of modern Scholar's essays or monographs, the fundamental sections of the body of the work (chapters of a book or sections of an article) should be marked up with <div1> tags; subsections with <div2> tags; sub-subsections with <div3> tags, and so forth. For the mark-up of Tibetan texts, as they may have an outline of up to 20 or more nested levels, we are using the unnumbered <div> tag, which can be infinitely nested. The attributes used for all <div> elements are:

  • type: chapter, section, outline
  • n: (number of chapter, section, etc.)

Chapter is self-explanatory. Section refers to a sub-section of a chapter. Outline refers to sa bcad. In an outline of a Tibetan work, the <div>s can be nested but they cannot overlap. In other words, if a <div> is opened inside another <div> it must be closed within that same <div>. Therefore, if outline and chapter breaks overlap, the <div>s can be used to record only one of them, generally the outline. The other is recorded using <milestone> markers. The “n” attribute gives the straight number of the chapter or section, such as <div n=“4” type=“chapter”> would be the opening tag for chapter four.

Headers

Headers are self-explanatory. The element is <head> and they generally are found as the first child of a <div> element. If the editor is adding a header for clarification, when the actual text is not there, then an <add> element should be placed immediately inside the head element and then the header’s text:

<head><add resp=“ndg”>This is a header added by NDG</add></head>

In this case, “ndg” must match the ID attribute value for a <name> included in the metadata, indicating who “ndg” is.

Paragraphs and Prose

The essential paragraph element is <p></p>. Prose of any sort should be marked-up with these <p>, or paragraph elements. They have the standard attributes but in general these do not need to be used unless a specific need arises. The paragraph element can take a number of relevant children including but not limited to clauses (<cl>), quotations (<q> or <quote>), phrases (<phr>), sentences (<s>), links (<ref>, <xref>, etc.), lists (<list>), numbers (<num>), titles (<title>), and a variety of name elements (<persName>, <placeName>, etc.). Paragraphs cannot be nested within one another but act as the basic content for <div>s that can be nested. Within paragraphs, the <s> element can be used to distinguish sentences, but this is not required. The text of each paragraph should be enclosed in <p> and </p> tags.

For quotations or citations that are a single paragraph in length, the quotation elements described below can be used in the same manner as a <p>, paragraph element. These are <q> for spoken quotations and <quote> for textual quotations. If these elements are used outside of a <p> element, they will display as a separate indented paragraph. If such quotes have multiple paragraphs, the <q> and <quote> elements should contain <p> elements. See the next section on citations.

Verse

There are two elements for marking up verse. The line group element, <lg>, bundles together a number of lines into a group. Each <lg> element represents a stanza of verse. The lines of verse are tagged with <l>, or line, elements. The <seg>, shad-delimited lines, should be placed fully within the <l> tags. Thus, a <lg> element with four <l> elements within it represents a four-lined stanza.

Lists

Lists are dealt with using the <list></list> element, which contains <item>s for children. The rend attribute is used to distinguish between types of list. The two basic types of list are bulletted lists and numbered lists. Numbered lists can make use of several different formats. The rend attribute is used to distinguish between these formats, using the same conventions as HTML:

rend=“A” — Capital Letters rend=“a” — Lowercase Letters rend=“I” — Capital Roman Numerals rend=“i” — Lowercase Roman Numerals rend=“1” — Arabic Numerals rend= “bullet” — Bulleted (Normal) rend= “disc” — Bulleted (discs) rend= “circle” — Bulleted (circles) rend= “square” — Bulleted (squares) rend= “none” — No bullet or number. Allows customized number to be placed at the beginning of each item, such as (1) or (A).

The “n” attribute can be used to distinguish a starting number for numbered lists if the first item does not begin with “1”.

Page and Line Numbering

In order for the digital text to be fully functional, there needs to be a way to reference a quote or section from it. For this, there needs to be some sort of page and line numbering. THL uses the TEI <milestone /> empty element to indicate the beginning of a page or a line. That element has an "unit" attribute that describes the milestone being delineated, e.g., page, line, digpage, or digline--the latter being for born-digital resources. An example of THL's use of the milestone would be:

།ཚུལ་ཁྲིམས་འཆལ་པས་<milestone unit="page" n="2a"/><milestone unit="line" n="2a.1" />ཟིན་རྣམས་ཀྱི།

Citations

There are two types of citations: those from written, textual material and the citation, or quotation, of a person speaking (whether actual or “mythic”). These are marked up differently. Textual citations use the <title> and <quote> elements, while spoken quotations use the <persName> and <q> elements. The actual cited or quoted material goes within either the <quote> or the <q> element. Both of these can, but are not required to, have paragraphs (<p>) or verses (<lg> and <l>) as their children so that the structure of the citation can be preserved. (Structural markup is described in the section above.) For shorter, unstructured quotations, which do not have internal paragraphs or verses, the rend attribute can be used to determine whether the quoted text is displayed inline or as an indented paragraph. If the rend attribute is set to “inline”, the quotation will be incorporated into the present paragraph without break. The default, when no rend attribute is given, is to display the quote as a separate, indented paragraph. In all cases, quotation marks should be included if they are to be displayed. The styles do not automatically add quotation marks.

Citations from texts are dealt with using <title> and <quote> elements. The <title> element marks the text’s title from which the quote originates, and the <quote> element encloses the quoted material itself. <q> should not be confused with <quote>, which is used for a quotation from a speaker. The general format is:

The <lg> and <l> elements are for indicating verse as described below.

Textual Citations

Marking up textual citations involves marking up the title and marking up the passage cited. For Tibetan texts, we use the <title> and <quote> elements. We do not use the more elaborate <cit> element with a child <bibl> also described in the TEI P4 guidelines, as bibliographic information is generally not included in Tibetan texts and in modern scholarly essays such material if found in footnotes, which require different markup. The general format is:

<title level="m" type="tantra">gsang ba’i snying po</title> las/_

<quote>

<lg>

<l>/rdo rje phung po yan lag ni/_</l>

<l>/rdzogs pa'i sangs rgyas lnga ru gra/</l>

</lg>

</quote>

Titles

Titles are marked with a <title> element. The title element should encompass only the source’s title and should not include the las or any other grammatical particle that is outside of the text’s proper name. The type attribute can be used to further specify the kind of title. However, this does not need to be done in the first round of mark-up. To include a translation of the title, insert a <foreign lang= “eng”> element at the end of the title and place the translation in it:

<title level="m" type="tantra" lang="tib">gsang ba’i snying po </title>

Citation Text

The text of the citation should be marked up in the <quote> elements. The TEI guidelines define <quote> as an element that “contains a phrase or passage attributed by the narrator or author to some agency external to the text.” The plain text can be included directly in the <quote> element, or structural markup, such as paragraphs (<p>) and verses (<lg>) may be included within the <quote> element. If the <quote> element contains plain text, the rend attribute determines its display. The default (with no rend attribute) displays the quote as a separate indented paragraph (like <blockquote> in HTML). If the text should be displayed in line, then the rend attribute should be set to “inline”. Thus, an example of the latter would be the one above:

<quote rend="inline"̇>“contains a phrase ....”</quote>

Speech Quotations

Quotations of a person’s speech should be enclosed in <q> tags and not in <quote> tags. The <q> element can contain all the elements for marking up both prose (<p>) and verse (<lg>). The speaker’s name, if given, can be marked up in the <persName> element outside the <q> element. As with the textual citations, described above, the <q> element can have a rend attribute of “inline” if the quote is to be displayed as part of the parent paragraph. Otherwise, it is set off as a separate indented paragraph.

All quotation marks should be included within the <q> tags, if the are to be displayed.

An example of an in-line speech quotation would be:

<persName>Milarepa</persName> said, <q rend="inline">“All worldly pursuits have but the one unavoidable end, which is sorrow.”</q>

Links

There are two types of linking elements. A pointer element, <ptr /> and <xptr />, is merely that. It contains no textual content but is an empty element. All the data concerning the link is contained in its attributes. A reference element, <ref> and <xref>, is an element that creates a link around some text and works in the same manner as an <a> element with an href attribute. There are two kinds of links: internal links that point to an anchor or ID elsewhere in the same document and external links that point to another separate document or file.

Internal Links

Internal links point to another section of the same document. The link is created by setting the “target” attribute of the linking element to the same value as the ID attribute of the target element. Thus, if one wanted to point to the beginning of section three, which was marked with the opening tag, <div type=“section” n=“3” id=“s3”> … Then, the pointer would look like this: <ptr target=“s3” /> and the reference tag would be: <ref target=“s3”>Go to section 3</ref>. If there is no opening element at the specific point one wants to link to, the <anchor/> element may be used. This is an empty element that can be assigned an ID for the purposes of linking. It can be inserted virtually anywhere.

External Links

External links point to another separate document. In this instance, we do not use the “target” attribute as with internal links. (3) Instead, the type attribute is used to indicate the general type of target. The targType attribute is used to indicate the specific type of target, and the n attribute contains the file name. The values for these attributes can be:

  • type: text, xml, url, image, audio, video, plaintext
  • targType: {name of DTD}, gif, jpg, mpg, flash, txt, etc.
  • note: this feature is not functional at present, so do not use the targType attribute.
  • n: e.g. Tb.343.bib.xml, image2.gif, intro.mpg, …
  • note: in the type attribute, "text" refers to an XML text, whereas plaintext refers to plain text. When a THDL XML document is referred to, then the n attribute value must begin with the collection and not with the root directory, which is /tibet/. In the future, we hope to use the targType attribute to refer to non-THDL XML documents, in which case its value will be the name of the DTD.

Images

The general element for various graphics and images to be displayed in an XML document is the same as for links, the <xref> tag. This is used for static images (.jpg, .gif, etc.) and animated ones (movies, panoramas, etc.). The “type” attribute is used to distinguish images from links and types of images from one another. The “n” attribute contains the URL of the image data file in either case.

Pictures

Pictures, or static images, are represented in an XML document using the <xref> element, as described above. All images can be inserted either within the text of the essay itself by placing the <xref> element at the end of a <p>, or paragraph, element. Or, they can be presented on their own line between paragraphs by included the <xref> inside a paragraph element with no text. In this latter case, the paragraph element should have its “rend” attribute set to “img”, e.g. <p rend= “img”>. The images included within such a <p> tag will be placed in a table with a single row. Thus, the editor should be cautious about including too many images in a single row, generally only 1 or 2 for larger images and 3 or 4 for smaller ones. The “type” attribute of the <xref> should be set to “img” and the “n” attribute should contain the URL for that image. The URL should be a relative URL, relative to the root directory of Tibet, presently on Iris. The caption of the picture should go between the opening and closing <xref> tags. Thus, an example of a picture tag would be:

<xref n="/images/cultgeo/sera/sp026.jpg" type="img">

A view of Sera from the mountain north of the monastery.

</xref>

Panoramas

To include an interactive panorama, where the mouse can be used to change the view or move the object in the picture, the <xref> tag is also used. However, it must be included within a paragraph element with its “rend” attribute set to “mov”. As with pictures, the caption for the movie in included between the opening and closing <xref> tags. As such images take specific parameters, these parameters are also included within the <xref> tag using <rs> elements (for “reference string”). These are usually placed at the beginning and have their “type” attribute set to the parameter name and the “n” attribute set to the value. Furthermore, these <rs> elements should have their “rend” attribute set to “none”. (Any element with its “rend” attribute set to “none” is not displayed in the resultant HTML document.) An example of a panorama’s markup is:

<p rend = "mov">

<xref n="/images/cultgeo/SeraRear.mov" type="mov">

<rs type="width" n="460" rend="none"></rs>

<rs type="height" n="320" rend="none"></rs>

<rs type="controller" n="true" rend="none"></rs>

Sera Monastery from the rear.

</xref>

</p>

Emphasis

The general element for emphasis is the <hi> tag. It uses its “rend” attribute to distinguish between types of emphasis. It contains the text that is to be emphasized. There are two typologies for the “rend” attribute, a general one and a specific one.

General Emphasis (strong/weak)

In the general typology, there are two values for the rend attribute, strong and weak. Bold and italic are examples respectively of strong and weak emphasis, but by using the general typology, the rendering is not limited to those choices. One may choose to render strong as bold and weak as italic, or one could render strong as dark green and weak as light green.

Specific Emphasis (Italic/Bold/Tib)

Specific typology of the rend attribute of the <hi> element refers to using the terms, “bold”, “italic”, “underline”, etc. These specifically indicate the type of rendering that is to be performed. However, there is another use of the <hi> element apart from ostensibly rendering concerns. This is to mark up Tibetan words or phrases that are contained within non-Tibetan text. In such an instance, the “rend” attribute is not used but the <hi> element’s lang attribute is set to “tib”. This is for latter implementations when all elements with lang=”tib” will be rendered in Tibetan script.

Glosses

When one wants to include a foreign language gloss for an English word, or vice versa, where the gloss is included in parentheses and italicized, one can use the <gloss> element. The lang attribute for the element should be set to the appropriate language code, as listed in the global attributes section. The XSLT stylesheets will automatically place the words within the tag in parentheses and italicize them. Thus, for example, the element <gloss lang="tib">thig le</gloss> would be rendered as: (thig le).

Descriptive markup

People

People are marked up with a <persName> element. There are several children elements available within <persName> that can be used to specify the particular parts of the name. However, no standard use for these has been designed.

Organizations

Organizations are marked up with a <orgName> element. There are several children elements available within <orgName> that can be used to specify the particular parts of the organization. However, no standard use for these has been designed.

Places

Places are marked up with a <placeName> element. There are several children elements available within <placeName> that can be used to specify the particular parts of the place’s name. However, no standard use for these has been designed. We are developing an integrated strategy for marking up all literary texts in THDL with XML, whether the texts are primary sources in Asian languages, contemporary Scholar's essays in THDL, Encyclopedia essays, or home pages in THDL. The first phase was developing a DTD for dealing with Tibetan canonical scripture, which is basically TEI with a few enhancements to record bibliographical information for Tibetan texts. These modifications contained in the TIBBIBL DTD can be summarized as follows:

  • a set of elements for recording ID information for Tibetan text (TIBIDDECL)
  • a set of elements for recording a description of the physical artifact (PHYSDECL)
  • a set of elements for recording Tibetan provenance (ORIGINATION)
  • a set of elements for recording doxographical categories (INTELLDECL)
  • a set of elements for recording variant titles (TITLEDECL and TITLEGRP)
  • a set of elements for recording text sections (TIBANAL and SECTIONS).

These enhancements apply primarily to bibliographic records for Tibetan texts and are not described here unless they apply to electronic text mark-up. We are primarily sticking to standard TEI for XML markup of texts. The DTD used for text representation is xtib.dtd. (4) This is the XML TEI DTD (version P4) with the addition of the TIBBIBL elements. The following is draft attempt at identifying the core elements/attributes – unless other specified, the elements (5) in question are TEI.

Description of Elements by Type

   
1.Root elementschoices for the highest level element that contains all other elements of an electronic document
2.Metadata elementselements for describing data concerning the data, i.e., how the electronic document was created
3.Structural elementselements for representing the structural divisions of a text
4.Descriptive elementselements that describe data by classification into name, place, organization, etc.
5.Bibliographic elementselements for recording bibliographic information concerning a digital resource, including elements for citations of texts
6.Rendering elementselements for encoding specific ways of rendering text
7.Linking elementselements for creating links either within a text or to an external resource

The following contains a description of the various elements in the TIB and TIBBIBL DTDs that are used to mark-up electronic texts and bibliographic records. It based on our experience in the THDL and the external link: TEI Guidelines. There are seven categories of elements discussed. Within these categories, each element is described in terms of how it is used, what element (tag) is used, applicable attributes, relevant children, the XSL rendering of the element, and Word formatting. This latter category refers to how a scholar should format her or his Word document so that it can be automatically converted to XML. It may contain the “element” in Word that corresponds to the XML element being described, or it may describe formatting that must be applied in the Word document to indicate specific mark-up. (Note: The Word conversion routine has not yet been written. However, by sticking to these guidelines such a conversion program would be straightforward to devise.)

Root Elements

TEI.2 AND TEICORPUS.2

Use: The root element for text mark-up in the THDL is the TEI root element. All documents must begin with this element as the highest level ancestor. There are two possible root element. The most common and recommended root element is the TEI.2 element. This would be used for a single electronic text, whether it is an electronic version of a printed text or an electronic catalog contained in a single file. The other possible root element, as of yet unused in the THDL, is the teiCorpus.2. This element allows one to group several full TEI.2 documents under a single teiCorpus.2 parent with an overarching metadata header.

Element: <TEI.2>, <teiCorpus.2>

Attributes: global

Id: the ID attribute can be used to tag the XML text with a unique THDL ID.

N: the N attribute can be used to provide an additional identifier for the text, perhaps one based on a commonly-used, non-THDL classification system.

Relevant Children: <teiHeader>, <text> (in the <TEI.2>) or <teiHeader>, <TEI.2> (in the <teiCorpus.2>).

XSL Rendering: equivalent to the root <html> element.

Word Formatting: the Word document.

Metadata Elements

THE TEIHEADER

Use: The metadata header is at the beginning of each text and encodes the relevant metadata for the XML document. It has four main parts, as laid out in the TEI guidelines: File Description, Encoding Description, Profile Description, and Revision Description. There is only one <teiHeader> per XML file.

Element: <teiHeader>

Attributes: global

Relevant Children: <fileDesc>, <profileDesc>, <encodingDesc>, <revisionDesc>

XSL Rendering: Link to separate window.

Word Formatting: Any metadata information as described below should be included in a table at the beginning of the document. The table should be separated from the text proper by a section break, either continuous or not. Any standard data can be automatically inserted by a conversion program, when it is written.

   
E-Text Title Tibetan:  
E-Text Title English:  
Name of Transcriber:Date: 
Editor:Date: 
Lineator:Date: 
Mark-up:Date: 
Title of Source Text:Pagination: 
Author of Source:Date: 
Translator of Source:Date: 
Publisher:Date: 
Place of Pub.:  
Description:  

FILE DESCRIPTION

Use: The file description element contains a full bibliographical description of the computer file itself, from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive. The term computer file here is to be understood as referring to the whole entity or document described by the header, even when this is stored in several distinct operating system files. The file description also includes information about the source or sources from which the electronic document was derived.

Element: <fileDesc>

Attributes: global

Relevant Children:

<titleStmt>: groups information about the title of a work and those responsible for its intellectual content.

<editionStmt> : groups information relating to one edition of a text.

<extent> describes the approximate size of the electronic text as stored on some carrier medium, specified in any convenient units.

<publicationStmt> groups information concerning the publication or distribution of an electronic or other text.

<seriesStmt> groups information about the series, if any, to which a publication belongs.

<notesStmt> collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.

<sourceDesc> supplies a bibliographic description of the copy text(s) from which an electronic text was derived or generated.

XSL Rendering & Word Formatting: See Metadata Header.

TITLE STATEMENT

Use: The title statement is used in the File Description in the teiHeader of an electronic text. Within a TIBBIBL bibliographic record, a title declaration or title group is used to encode a text’s titles.

Element: <titleStmt>

Attributes: global

Relevant Children: <title>, <author>, <sponsor>, <funder>, <principal>, <respStmt>, <resp>, <name>

XSL Rendering & Word Formatting: See Metadata Header.

Use: The edition statement in the File Description is used to describe the edition of the electronic text that is represented in the present file. Edition information relating to the physical manuscript that was used as the source of the electronic document would be found in the Source Description.

Element: <editionStmt>

Attributes: global

Relevant Children: <edition>, <respStmt>

XSL Rendering & Word Formatting: See Metadata Header.

Use: The publication statement is found either in the file description (fileDesc) of the metadata section of an electronic text (teiHeader) or as part of a Tibetan Bibliographic record (TIBBIBL). It has the same format in either case. It can contain a series of children-element that describe the electronic text’s or Tibetan text’s publishing information or it can contain a prose paragraph (p), describing the publication.

Element: <publicationStmt>

Attributes: global

Relevant Children: <publisher>, <distributor>, <authority>, <pubPlace>, <address>, <idno>, <availability>, <date>, <respStmt>

XSL Rendering & Word Formatting: See Metadata Header.

Use: The extent element is used in various places to indicate the length of a document. In the metadata section (teiHeader), the extent element is used to describe the length of the electronic file in some standard measurement: kilobytes, megabytes, words, paragraphs, etc.

Element: <extent>

Attributes: global

Relevant Children: <name>, <num>

XSL Rendering & Word Formatting: See Metadata Header.

Use: In the metadata section (teiHeader) this element groups information about the series, if any, to which an electronic publication belongs. The same principle applies in bibliographic records of printed works.

Element: <seriesStmt>

Attributes: global

Relevant Children: <title>, <idno>, <respStmt>

XSL Rendering & Word Formatting: See Metadata Header.

Use: The source description is a mandatory element in the File Description of the metadata section of an electronic text. It is used to record details of the source or sources from which a computer file is derived. This might be a printed text or manuscript, another computer file, an audio or video recording of some kind, or a combination of these. An electronic file may also have no source, if what is being catalogued is an original text created in electronic form.

Element: <sourceDesc>

Attributes: global

Relevant Children: <bibl>, <biblStruct>, <biblFull>, <biblList>, <scriptStmt>, <recordingStmt>

XSL Rendering & Word Formatting: See Metadata Header.

ENCODING DESCRIPTION

Use: This element provides a metadata section for describing the principles used in encoding the electronic text. The TEI Guidelines states that “It specifies the methods and editorial principles which governed the transcription or encoding of the text in hand and may also include sets of coded definitions used by other components of the header. Though not formally required, its use is highly recommended.” However, it is not generally used in THDL documents, because the description of editorial methods and principles will be located in the project’s documentation and not each specific document. One use for this section of the metadata header would be if the editorial principles for a specific text differed from the project’s standards.

Element: <encodingDesc>

Attributes: global

Relevant Children: Rather than include a separate entry for each child of this unused element, the following list includes both the element name and a description taken from the TEI guidelines:

<projectDesc> describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.

<samplingDecl> contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.

<editorialDecl> provides details of editorial principles and practices applied during the encoding of a text.

<tagsDecl> provides detailed information about the tagging applied to an SGML or XML document.

<refsDecl> specifies how canonical references are constructed for this text.

<classDecl> contains one or more taxonomies defining any classificatory codes used elsewhere in the text.

<fsdDecl> identifies the feature system declaration which contains definitions for a particular type of feature structure.

<metDecl> documents the notation employed to represent a metrical pattern when this is specified as the value of a met, real, or rhyme attribute on any structural element of a metrical text (e.g. lg, l, or seg).

<variantEncoding> declares the method used to encode text-critical variants.

XSL Rendering & Word Formatting: N/A

PROFILE DESCRIPTION

Use: This element provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. Of its children the most relevant for the THDL at present is the <langUsage> tag.

Element: <profileDesc>

Attributes: global

Relevant Children: <creation>, <langUsage>, <textClass>

XSL Rendering & Word Formatting: See Metadata Header.

Use: This element describes the languages used in the electronic text. Each language that is to be referenced in a “lang” attribute within the text itself must have a corresponding element whose “id” attribute matches the value of the “lang” attribute. That is, within the major element <langUsage>, there are a series of <language> tags whose “id” attribute is the ISO-639-2/B code for the language, while the text contained in that element would be the languages name. For example, a language tag for Tibetan would be: <language id=“tib”>Tibetan</language> and an element that contained Tibetan would be, for example, <p lang=“tib”>’di skad bdag gis thos pa’i dus su …</p>. If a text uses the lang attribute without declaring the language in the metadata section, the document will not validate.

Element: <langUsage>

Attributes: global

wsd: this attribute of the <language> tag is used to link to the “writing system declaration” which is an external XML file used to describe the writing system. This is presently not used in the THDL, but in the future may provide a way to distinguish between documents of Wylie transliteration and those containing Unicode Tibetan.

rend: another attribute that could be used to distinguish the transliteration scheme or script used is the “rend” attribute. This would only be used if the whole document was transliterated according to a single scheme. To use this we need standardized values for the various transliteration schemes:

Relevant Children: <language>

XSL Rendering & Word Formatting: See Metadata Header.

Transliteration SchemeAbbreviation
Extended Wylie Transliteration Schemeewts
Library of Congressloc
Asia Classics Input Projectacip

Use: This element is used to describe the text according to some classification scheme, whether it be a loose one such as keywords or a more rigid doxographical classification.

Element: <textClass>

Attributes:

Relevant Children: The children of the text class and their uses along with the prime attribute are: <keywords> contains a list of keywords or phrases identifying the topic or nature of a text.

scheme: identifies the controlled vocabulary within which the set of keywords concerned is defined.

<classCode> contains the classification code used for this text in some standard classification system.

scheme: identifies the classification system or taxonomy in use.

<catRef> specifies one or more defined categories within some taxonomy or text typology.

target: identifies the categories concerned

XSL Rendering & Word Formatting: See Metadata Header.

REVISION DESCRIPTION

Use: The revision description provides a detailed change log in which each change made to a text may be recorded. It contains a list of one or more <change> elements that describe what changes have been made to the electronic document and by whom. The <change> element in turn should have a <date> element, a <respStmt> describing who is responsible for the change, and a <item> element containing a prose description of the nature of the change

Element: <revisionDesc>

Attributes: global

Relevant Children: <list>, <change>, <date>, <respStmt>, <item>

XSL Rendering & Word Formatting: See Metadata Header.

STRUCTURAL ELEMENTS

THE ELECTRONIC TEXT

The electronic text, whose root element is TEI.2, necessarily has two major section: the metadata section, contained in the <teiHeader> element, and the text itself, within a <text> element. The text element is the electronic equivalent to a physical text, while the teiHeader would be the text’s cataloging record. Below, the elements involved in the electronic representation of a text are described.

THE TEXT ELEMENT

Use: The text element represents the whole electronic text, apart from the metadata about that text and file. It is the highest level container for all structural and descriptive elements that are form the actual text. It is subdivided into <front>, <body>, and <back> sections of which only the body is required or it can contain a <group> element to group together several texts.

Element: <text>

Attributes: global, the THDL text id should be put in the id attribute of the root TEI.2 element and not in the id attribute of the text element, since the full electronic text includes both the teiHeader and the text.

Relevant Children: <front>, <body>, <back>, or <group>

XSL Rendering: the <body> of the HTML document.

Word Formatting: The body of the Word document after the metadata table and the first section break.

FRONT

Use: The front element contains the front sections of a text, such as the title line and homage. These front “chapter-level elements” (CLEs) are contained within div or div1 elements that are children of the front element. The front sections are identified by “a#”, where # is the number of the front CLE. So the first front section is “a1” and so forth. The full list of the possible front sections are:

  • Title page
  • Title line
  • Homage/Invocation/Praise
  • Statement of intent
  • Untitled introduction
  • Ordinary introductory scene (thun mong gleng gzhi)

Extraordinary introductory scene (thun mong ma yin pa’i gleng gzhi) (6)

  • Outline (sa bcad)

Element: <front>

Attributes: global, the id attribute of the front element can be set to either simply “a” or the text id, dot, “a”, as in “thdl.text.345.a”. This may assist in searches but is not absolutely necessary, as the div elements must be given an id attribute.

Relevant Children: <div>, <div1>, <head>

XSL Rendering: first section in the body of the HTML document, could use HTML divs for this.

Word Formatting: The first division marked with the "Section" (abbrev. st) style, with the word Front written out.

BODY

Use: The body of the text is contained in an element by the same name that contains the major sections of the text itself, often the chapters. These “chapter-level elements” (CLEs) are encoded in “div” elements but can be labeled according to the THDL taxonomy. The body sections are identified by “b#”, where # is the number of the body CLE. So the first chapter is “b1” and so forth. The full list of the possible body sections is short:

  • Section divisions: containing subdivisions for chapters, etc.
  • Chapters
  • Chapter title
  • Chapter homage
  • Chapter colophon
  • Interstitial Chapters (unnumbered sections between sequentially numbered chapters)

Element: <body>

Attributes: global, the id attribute of the body element can be set to either simply “b” or the text id, dot, “b”, as in “thdl.text.345.b”. This may assist in searches but is not absolutely necessary, as the div elements must be given an id attribute.

Relevant Children: <div>, <div1>, <head>

XSL Rendering: second section in the body of the HTML document, could use HTML divs for this.

Word Formatting: The first division marked with the "Section" (abbrev. st) style, with the word Body written out.

BACK

Use: The back element contains all the sections at the end of a text including closing sections, colophons, closing invocations, and so forth, as listed by the THDL taxonomy of sections. These “chapter-level elements” (CLEs) are encoded in “div” elements but can be labeled according to their section type. The back sections are identified by “c#”, where # is the number of the body CLE. So the first back section is “c1” and so forth. The full list of the possible back sections:

  • Closing section
  • Author’s colophon
  • Redactor’s colophon
  • Translator’s colophon
  • Lineage transmission
  • Reviser’s colophon
  • Editorial colophon
  • Scribal colophon
  • Printing colophon
  • Concluding prayer (mjug byang smon lam)
  • Closing invocation
  • Undetermined colophon

Element: <back>

Attributes: global, the id attribute of the back element can be set to either simply “c” or the text id, dot, “c”, as in “thdl.text.345.c”. This may assist in searches but is not absolutely necessary, as the div elements must be given an id attribute.

Relevant Children: <div>, <div1>, <head>

XSL Rendering: third section in the body of the HTML document, could use HTML divs for this. Word Formatting: The first division marked with the "Section" (abbrev. st) style, with the word Back written out.

DIVISIONS, HEADERS, AND FOOTERS

DIVISIONS

Use: The group of “div” elements is used to mark the boundaries of a text’s inner subdivisions and subdivisions within either the front, body, or back sections of a text. The subdivisions of the front, body, and back are known as “chapter-level elements” (CLEs). There is a generic and a numbered version of the “div” element. The numbered versions of this element must be sequentially nested, beginning either with div1, with the appended number increasing for each generation. Div1 is subdivided into div2s, which are subdivided into div3s, and so forth. However, for Tibetan style outlines, or sa bcad, the generic <div> element should be used, because this element can be infinitely nested within itself. In either case, the individual divisions can be assigned a unique ID attribute. The type attribute can also be used to distinguish the type of division (chapter, book, section, etc.) and the n attribute can be the number of that division so that with the type attribute, it could be “chapter 1”, “book 2” and so forth. Use of the id, type and n attributes is helpful in both cases for quick identification and rendering of sections.

Element: <div> or <div1> … <div9>

Attributes: global.

id: the id attribute for each CLE division should be assigned according to which section of the text the division belongs—front, body, or back—and the number of the CLE within that section. Thus, the third chapter-level element (3) of the front section (a) would be “a3”. Further, subdivisions would be designated by adding a period and a number. Thus, the second part of the fourth part of the sixth chapter in the body of the text would be (b6.4.2). If the text is well-structured and correctly marked up, values for id-attributes can be generated automatically.

n: the n attribute is for a simple “name” or, in this case, “number” of a section. It should contain the number of the division among its siblings. In the example above, this would be simply “2”. This can be combined with the value of the type attribute to form a header, such as “Chapter 2”.

type: the type element should describe the type of division contained in the element. The typology is:

chapter: for chapter level elements

section: for internal sections within a chapter or subsections of a section, etc. Use this also in cases where there are no chapters but only outline (sa bcad) divisions.

Relevant Children: other “div” elements, <head>, <p>, <lg>, <cit>, <q>, <quote>, <tibbibl>, <bibl>, etc.

XSL Rendering: by header or as a section in a structure tree.

Word Formatting: In order to capture the kind of hierarchical structure inherent in Tibetan tracts, it will be necessary for the editor, transliterator, or translator to adhere to a strict use of the header styles in word. In this method, Header 1 would represent the first subdivisions of the front, body, and back elements; header 2 would be their inner divisions, and so forth. Word will allow the user to create any number of “Header N” beyond the ten supplied with the normal template. The names for these headers should consistently follow the pattern, “Header 15,h15”. The actual typed in text you give to these labels should be the name of the section, which in the case of Front matter would be "homage", etc. following our list of permissible front/back subsections, and for sa bcad headers would be the name of the sa bcad.

TOPICAL OUTLINE (SA BCAD)

Use: to mark actual specifications of the sa bcad within the text. This contrasts to the use of headers to structurally divide the text into its outlined hierarchical structure – this simply marks the actual text specifying the sa bcas as such type of text. The clause or series of clauses that represent the outline are enclosed within a segment tag, <seg>, that is given the type attribute of “outline”.

Element: <seg>

Attributes: global,

Type: given value of “outline” for this purpose.

Subtype: not used for this, but could be used in future for more detailed mark-up of outline.

Relevant Children: <seg> element can be nested, if more specific mark-up of the outline is needed.

XSL Renders: as red font.

Word Style: use the style "Topical Outline" (abbrev. to).

HEADERS

Use: Headers are structural elements in that they are associated with textual subdivisions (sa bcad), that is, the “front”, “body”, “back”, and “div” elements. Each of these can have a header element, though it is not required. For the sake of clarity, headers can be inserted into a text even when they are not explicitly present in the physical version, as long as the <add> element was used inside the header to indicate that the text was added by an editor. In other words:

Element: <head>

Attributes: global, included within these are the rend, n, and type attributes, which could be used if it seemed helpful. The “rend” attribute can be used to hard-code the HTML element desired (h2, etc.) or a typology could be developed which described how the header was to be used, i.e. rend=“section page-top” would indicate that the header was to be used at the beginning of the section and the top of each ‘page’.

Relevant Children: <add> (for enclosing the header’s text when the header is being added by the electronic editor).

XSL Rendering: <h1>, <h2>, <h3>, etc.

Word Formatting: the word header styles as mentioned above, with the desired inserted label typed out.

FOOTERS

Use: There are no specific elements in TEI for a “footer”, such as one sees at the bottom of a Word document. The necessity of footers in the HTML environment where the bottom of the page is often off the screen is debatable. However, like Word documents they would be desirable for printed versions. Separate “footers” can be included in the mark-up by using a second head element with a rend attribute set to a specified value, such as “footer” or “page-foot”. Or, they could be designated by a type attribute equal to “footer”.

Element: <head>

Attributes: global, use either type=“footer” or rend=“page-foot”

Relevant Children: <add> (for enclosing the header’s text when the header is being added by the electronic editor).

XSL Rendering: Footer at the bottom of a printed page or PDF file.

Word Formatting: the Word “footer”— there is no plan to implement this in the near future.

FOOTNOTES

Use: To insert footnotes into a text the “note” element is used. The note is available in most of the structural, descriptive, or bibliographic elements. In TIBBIBL documents, that are bibliographic records of Tibetan texts, the “discussion” element is used for longer, autonomous discussions on a topic. The “note” element is strictly for footnotes or endnotes.

Element: <note>

Attributes: global

type: describes the type of note. Values can be taken from any convenient typology of annotation suitable to the work in hand; e.g. annotation, gloss, citation, digression, preliminary, temporary

resp: (responsible) indicates who is responsible for the annotation: author, editor, translator, etc.

au(thor): note originated with the author of the text.

ed(itor): note added by the editor of the text.

comp(iler): note added by the compiler of a collection.

tr(anslator): note added by the translator of a text.

transcr(iber): note added by the transcriber of a text into electronic form.

(initials): note added by the individual indicated by the initials.

place: indicates where the note appears in the source text, such as “foot”, “end”, “inline”, “left”, “right”, “interlinear”.

Relevant Children: a note can have all the standard prose, verse, list and citation tags and even other notes. It cannot contain structural tags such as “div”s or “head”s.

XSL Rendering: Link to a section of endnotes.

Word Formatting: Word footnote.

LINEATION

Use: To keep track of digital pagination. The digital text does not have the physical boundaries of line and page that are imposed on an actual manuscript. Therefore, there needs to be a way to enumerate line so that they can be referred to absolutely within a digital text. We have decided to do this with shay-delimited lines. Shay-delimited lines are lines of Tibetan that end in some form of punctuation. Generally, but not always, these are shays and white space. (After a ga, of course, there is no shay only white-space.) These lines are marked up in the <seg> or segment element. There are 100 lines per digital page, and the N attribute is used to give the page and line number. An example would be <seg n=”3.76” type=”shad”>Tibetan words here/</seg>, which would be the 76th line of the 3rd page.

Element: <seg>

Attributes: global

n: the N attribute contains the page number followed by a period, followed by the line number (1-100).

Relevant Children: All sentence and phrase level elements including <s>, <cl>, <phr>, etc. It is also recursive and so can contain other <seg>’s.

XSL Rendering: Depends on the parent—prose would be rendered as a paragraph with no separation of clauses & verse would be represented with indented stanzas with each clause on a separate line.

Word Formatting: no special formatting needed. The Tibetan punctuation provides enough information to determine the shay-delimited line.

SOURCE PAGINATION

Use: To keep track of the pagination in the original document, milestone markers are used with unit attribute equal to “page” and the n attribute equal to the number of that page. Milestone markers are empty elements in that they can have no content other than their attributes.

Element: <milestone />

Attributes: global

unit: the unit attribute denotes the unit being counted, in this context, the “page”.

n: the N attribute contains the page number followed by a period, followed by the line number (1-100).

Relevant Children: None.

XSL Rendering: Can be displayed in bracket ## or hidden.

Word Formatting: Milestone style, ms is applied to the pagination in brackets.

PROSE

PARAGRAPHS

Use: Paragraph or paragraph-equivalents. There is no clear analog to English paragraphs in Tibetan, but we believe it is helpful in some cases to still try to mark up Tibetan texts into sections paralleling paragraphs. The trick is how short or long to make them, and our intention is to provide examples as the easiest guideline in this context.

Element: <p>

Attributes: global

Relevant Children: The paragraph element “p” can contain sentence elements (s), verse elements (lg and l), formatting elements (hi), clause elements (cl, phr), notes (note), and linking elements (ptr, ref), and other phrase level elements. It cannot however contain another paragraph, or chunk, element.

XSL Rendering: paragraphs.

Word Formatting: "Paragraph" style (abbrev. "p")

SENTENCES

Use: Sentences or sentence-equivalents. These are the individual units contained by a paragraph. There is no clear analog to English sentences in Tibetan, but we believe it is helpful in some cases to still try to mark up Tibetan texts into lines paralleling sentences. The trick is how short or long to make them, and our intention is to provide examples as the easiest guideline in this context. The utility of such markup can be that it helps students see how to break up a text into semantic units, either as a much easier markup than actual translation, or as a pedagogical tactic to give students aids without having them use a translation.

Element: <s>

Attributes: global

Relevant Children: clause (cl), phrase (phr), and other phrase-level elements.

XSL Rendering: No special rendering unless implemented as a teaching tool.

Word Formatting: No special formatting necessary. <s> elements would need to be inserted by hand.

VERSE

STANZAS

Use: TEI’s poetry module allows for marking up verse text according to the common paradigm of stanzas and lines. The element for a stanza is the “lg” element, which stands for “line group” because it groups a series of line (l) elements within it. A line-group can contain another line-group so that they can be infinitely nested.

Element: <lg>

Attributes: global

type: the type attribute can be used to describe the kind of verse, the metrical structure, etc.

Relevant Children: <l>, nested <lg>

XSL Rendering: indented (blockquote).

Word Formatting: a consistent verse style (verse1 for 1st line and verse2 for 2nd) to differentiate between the beginning of a stanze (verse1/v1) and other lines in the stanze (verse2/v2).

LINES

Use: The lines contained within a stanza are each marked by line (l) elements.

Element: <l>

Attributes: global

type: the type attribute can be used to describe the kind of verse, the metrical structure, etc.

Relevant Children: <s>, <cl>, <phr>, <list>, etc.

XSL Rendering: indented (blockquote).

Word Formatting: a consistent verse style (verse1 for 1st line and verse2 for 2nd) to differentiate between the beginning of a stanze (verse1/v1) and other lines in the stanze (verse2/v2).

LISTS

LISTS ORDERED (I.E. ENUMERATED)

Use: There is a single list element for all types of lists (numbered and unnumbered). The difference is determined by the values of their rend, n, and/or type attribute. Each list contains any number of item elements, and these item elements can contain nested list elements to any depth.

Element: <list>, <item>

Attributes: global

type: the type attribute should be “ordered”

rend: the rend attribute should give the HTML class for the type of numbering desired:

  • I – capital Roman Numerals
  • A – capital Letters
  • 1 – Arabic numberals
  • i – lowercase Roman Numerals
  • a – lower case letters

n: the n attribute contains the number from which the item numbering should begin. Thus, for a list to begin with the eighth item, the attribute n should be set to “8”; for a list that uses a, b, c, d, etc to begin with the third item, the attribute n should be set to “c”.

Relevant Children: <item>, <list>, other sentence-level elements.

XSL Renders: ordered lists (<ol>)

LISTS UNORDERED (I.E. NON-ENUMERATED)

Use: There is a single list element for all types of lists (numbered and unnumbered). The difference is determined by the values of their rend, n, and/or type attribute. Each list contains any number of item elements, and these item elements can contain nested list elements to any depth.

Element: <list>, <item>

Attributes: global

type: the type attribute should be “unordered”

rend: the rend attribute should indicate the type of item marking:

  • bulleted – normal bulleted list
  • dash – uses dashes instead of bullets
  • box – uses squares instead of bullets
  • arrow – uses arrows instead of bullets
  • star – uses stars or asterisks instead of bullets
  • user-determined rend value can be used with specific style sheets to render variously.

n: the n attribute would not be specifically used with unordered lists, though it could be used to indicate the level of nesting or to give the URL for an image file of a custom bullet.

Relevant Children: <item>, <list>, other sentence-level elements.

XSL Renders: ordered lists (<ol>)

SECONDARY LISTS

Use: As with HTML, Word and most text processing mark-up or software, lists in XML can be infinitely nested. The elements used for secondary lists are the same two for primary lists, namely <list> and <item>. To create secondary lists, one merely puts a second list element within one of the primary list’s items. Thus, the mark-up would look something like:

<list type=“unordered” rend=“bulleted”>

<item>Sutra Vehicle

<list type=“ordered” rend=“1” n=“1”>

<item>Hearer Vehicle</item>

<item>Solitary Realizer Vehicle</item>

<item>Bodhisattva Vehicle</item>

</list>

</item>

<item>Tantra Vehicle

<list type=“ordered” rend=“A” n=“1”>

<item>Outer Tantras

<list type=“ordered” rend=“1” n=“4”>

<item>Action Tantra</item>

<item>Performance Tantra</item>

<item>Yoga Tantra</item>

</list>

</item>

<item>Inner Tantras

<list type=“ordered” rend=“1” n=“7”>

<item>Mahāyoga Tantra</item>

<item>Anuyoga Tantra</item>

<item>Atiyoga Tantra</item>

</list>

</item>

</list>

</item>

</list>

This would render:

Sutra Vehicle

Hearer Vehicle

Solitary Realizer Vehicle

Bodhisattva Vehicle

Tantra Vehicle

Outer Tantras

Action Tantra

Performance Tantra

Yoga Tantra

Inner Tantras

Mahāyoga

Anuyoga

Atiyoga

SENTENCE STRUCTURE

Sentence structure elements, known as phrase-level elements, or chunks, exist in the TIB version of TEI. They are used to distinguish between various parts of a sentence at the phrase or clause level. They are not presently used in THDL text mark-up, but they are available and so are listed here in abbreviated form.

CLAUSES

Use: The tag for marking a clause.

Element: <cl>

Attributes: global

PHRASES

Use: The tag for marking a phrase

Element: <phr>

Attributes: global

SEGMENTS

Use: There is also a tag for delimiting a span of text for any purpose. The <seg> element can contain text, or clauses (<cl>) and is distinguished by its “type” and “subtype” attributes. This is used with type=“outline” to mark sa bcad. See Topical Outline above.that is used in particular for annotations. The span element delimits the section of text that is being annotated and is linked to the annotation element itself. However, spans can also be used as a generic phrase grouping element.

Element: <seg><list>, <item>

Attributes: global

Type: the type of segment, right now only “outline” is designated as a type.

Subtype: the subtype as yet unused.

Function: again unused.

Relevant Children: Any clause level element; not paragraphs (p) or verse (lg).

XSL Renders: Case specific. See Topical Outline above.

Word Formatting: Case specific. See Topical Outline above.

DESCRIPTIVE ELEMENTS

PROPER NOUNS

PLACE NAMES

Use: Elements can be used to mark the names of places. The place name element contains an absolute or relative place name. Several possible children elements are available within placeName, as listed below.

Element: <placeName>

Attributes: global

Relevant Children:

<settlement> contains the name of the smallest component of a place name expressed as a hierarchy of geo-political or administrative units

<region> in an address, contains the state, province, county or region name; in a place name given as a hierarchy of geo-political units, the region is larger or administratively superior to the settlement and smaller or administratively less important than the country.

<country> an address, gives the name of the nation, country, colony, or commonwealth

<bloc> a geo-political unit containing one or more nation states.

<geogName> a name associated with some geographical feature such as Windrush Valley or Mount Sinai.

<geog> contains a common noun identifying some geographical feature contained within a geographic name, such as valley, mount etc.

XSL Rendering: plain text in general, but could be rendered..

Word Formatting: Place style, pl, green font.

PERSONAL NAMES

Use: Both place and personal names can be marked up in Tib/TEI with a simple <name> attribute. However, the more specific element for peoples’ names is the “persname” element. It contains a proper noun or proper-noun phrase referring to a person, possibly including any or all of the person's forenames, surnames, honorifics, added names, etc. It can contain either the text of the whole name or children elements that divide the name into its component parts, as described below.

Element: <persname>

Attributes: global

type: describes the personal name more fully using an open-ended list of words or phrases which help to indicate the function, e.g. ‘married name’, ‘maiden name’, ‘pen name’, ‘religious name’, etc Relevant Children:

<surname> contains a family (inherited) name, as opposed to a given, baptismal, or nick name.

<foreName> contains a forename, given or baptismal name.

<roleName> contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank.

<addName> contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name.

<nameLink> contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of.

<genName> contains a name component used to indicating generational information, such as Junior, or a number used in a monarch's name.

XSL Renders: plain text in general, but could be rendered.

Word Formatting: Personal Name style, pe, blue font

ORGANIZATION NAMES

Use: This is a set of elements used to mark-up and thereby distinguish the names of organization, as opposed to places or people. There are several elements in TEI for this purpose. Their general parent is “orgName”.

Element: <orgName>

Attributes: global

type: more fully describes the organization indicated in the organizational name. Possible values include ‘voluntary’, ‘political’, ‘governmental’, ‘industrial’, ‘commercial’, etc

key: provides an alternative identifier for the organization being named, such as a database record key.

reg: (regularization) gives a normalized or regularized form of the organization name

Relevant Children: <orgTitle>, <orgType>, <orgDivn>, <geogName>

XSL Renders: plain text in general, but could be rendered

Word Formatting: Organization style, or, orange font

FOREIGN LANGUAGE

Use: It should be noticed that all text elements in TEI elements contain a <lang> attribute allowing one to specify the language of the item in question. The language used needs to be defined in a LANGUAGE element that contains a LANGUAGE tag with its ID attribute set to the abbreviation such as "tib". However one can also use the <foreign> element in order to mark a word which is other than the main language of the text in question. Thus it might mark Tibetan words within an an essay that is primarily English in which Tibetan words were present; or by contrast, we use the foreign tag for inserting English translations within a primarily Tibetan cataloging record. In both cases, the foreign tags lang attribute is set to the appropriate language that it contains so there can be no confusion

Element: <foreign>

Attributes: global

<lang> attribute specifes the specific language in question.

XSL Renders: inside parentheses in italic, as in “… the Tibetan language (bod skad) is …”

Word Formatting: Foreign style, fo, (italic brick) and in parentheses.

NAME

Use: A generic name element exists for marking up any type of name—personal, place, or organizational, or other. A name element can contain the more specific instances of “persName”, “placeName”, etc.

Element: <name>

Attributes: global and

key: key to a database or table of names

reg: the regularized form of the name.

type: the type of name.

Relevant Children: <date>, <geogName>, <name>, <num>, <orgName>, <persName>, <placeName>, <foreign>

XSL Renders: no special rendering.

Word Formatting: Generic name style, gn, red font

EMENDATIONS AND CORRECTIONS

ADDITIONS

Use: To fill in an abbreviation or ellipsis in the text.

Element: <add>

Attributes: global and

place: this is for additions inserted by the scribe, xylographer, or printer, and denotes where the addition is located in relation to the main text line. Values are: inline, supralinear, infralinear, left, right, top, bottom, opposite, verso, mixed.

resp: this attribute identifies who is responsible for the addition. This attribute must contain the value of the id attribute for a person who is listed as responsible in some way for the text’s content. (7)

cert: the degree of certainty of the addition.

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

MISTAKES AND CORRECTIONS

Use: There are elements for marking up mistakes in the text and/or their corrections. The primary tags are <sic> and <corr>. The <sic> tag is placed around a mistaken and the “corr” attribute is given the value of the corrected text, while the <corr> tag is placed around the corrected version of the text and the “sic” attribute is given the value of the original text. Primarily, we use the <corr> tag as we want the corrected text to be displayed initially.

Element: <sic>, <corr>

Attributes: global and

corr: When the <sic> element is used, it goes around the actual text and the corr attribute is given the value of the corrected text.

sic: When the <corr> element is used, it goes around the corrected text and the sic attribute is given the value of the original text.

resp: This attribute is on both elements and should contain the ID attribute value for a person responsible in creating the XML document. This is generally a <resp> element in the metadata that has an ID attribute set containing the editors initials.

cert: This attribute is given a value that represents the certainty of the correction.

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

ABBREVIATIONS AND EXPANSIONS

Use: There are elements for marking up abbreviations and expansions. These are <expan> and <abbr> and are similar in function to the <sic> and <corr> elements above. The <expan> element goes around the expanded text and its “abbr” attribute is set to the original abbreviation, while the <abbr> element goes around the original abbreviation and its “expand” attribute is set to the expanded version.

Element: <abbr>, <expan>

Attributes: global and

abbr: When the <expan> element is used, it goes around the expanded text and the abbr attribute is given the value of the original contraction.

expan: When the <abbr> element is used, it goes around the contracted text and the expan attribute is given the value of the expanded text.

cert: This attribute is given a value that represents the certainty of the expansion.

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

GLOSSES

Use: To provide a foreign language gloss of an English term. Such as "Old School (rnying ma)".

Element: <gloss>

Attributes: global , especially the lang attribute which describes the language of the gloss.

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

VARIANT READINGS

APPARATUS

Use: The apparatus element, <app>, is used to encompass and group in association all the variant readings for some portion of a text. It is therefore a grouping element that contains only other elements and no text. The lemma, <lem>, element contains the primary reading and other alternative readings are placed within <rdg> elements. If readings are to be grouped in a specific way, one may also use the reading group, <rdgGrp>, element.

Element: <app>

Attributes: global

Type: The type attribute of the <app> element can be used to distinguish the type of variants found in this apparatus, though we presently do not use this. A typology would have to be constructed

Relevant Children: <lem>, <rdg>, <rdgGrp>

XSL Renders: One option is that variant readings can be rendered as footnotes or endnotes as in the typical print style. Another electronic option is to render each reading in-line. The editions would be color coded, the reading from that addition appearing in that color. By clicking on the word, one could scroll through the various readings in turn.

Word Formatting: This should be done according to the following guidelines:

  1. Put brackets around the section of the text that you are recording a variant for, with the left bracket following the preceding syllable with an intervening space
  2. Insert a footnote after the right bracket without any intervening space; insert space after footnote before next syllable
  3. In footnote, write TK chos - default is first letters indicate sigla, and after space is variant
  4. If multiple editions give the same reading, then separate the sigla with hyphens: TK-TB-AB chos.
  5. If different editions give different readings for the same term, then the terms are separated by a semi-colon – TK-TB chos; DG bcos
  6. If a given edition simply lacks the bracketed word(s), then you give the sigla and say "absent" (check what convention is) – TK absent
  7. If a given edition inserts an extra term, then you simply insert two brackets containing nothing at the insertion point, and the footnote gives sigla and the term: TK chos.
  8. To give a rationale for the decision as to what the normative reading is, in the footnote you end the variant reading with a period, and then preced the commend with "R: ". Thus TK chos R: The chos appears to be a late correction which….
  9. The one problem I see is when variant readings overlap – how to do it? Even if you chose to go with larger phrases to avoid overlap, what about when an entire line is absent in one edition, while other editions might have variant readings for specific words within that line; I suggesting using {} to enclose entire lines in this case.

LEMMA

Use: The lemma is the preferred reading among all the variants. This element is used within an apparatus element (<app>). It is not required to have a lemma, if no reading is preferable or preference is unclear. The edition that is the source for the lemma’s variant is recorded in its “wit” (i.e., witness) attribute. Lemmas can have an internal apparatus, representing readings within readings.

Element: <lem>

Attributes: global

wit: The two- or three- lettered sigla for the source edition(s) is/are placed in this attribute. If multiple editions are cited as the source, then the siglas are separated by spaces. An example might be: <app><lem wit=“Tb Dg1 Kg”>rtogs</lem><rdg wit=“Bg Dg2”>rtog</rdg></app>

Relevant Children: <app>, <milestone>, <cl>, <s>, <q>

XSL Rendering: See Apparatus.

Word Formatting: See Apparatus.

READINGS

Use: The reading element is used to record variant readings of a section of text. The reading tag, along with the lemma tags, are used within the apparatus element (<app>). There needs to be at least one reading element within every apparatus element. The source edition is recorded in the “wit”, or witness, attribute, as with lemmas, and also as with those elements, the reading element can contain an internal apparatus.

Element: <rdg>

Attributes: global

wit: The two- or three- lettered sigla for the source edition(s) is/are placed in this attribute. See Lemma for an example.

Relevant Children: <app>, <milestone>, <cl>, <s>, <q>

XSL Rendering: See Apparatus.

Word Formatting: See Apparatus.

BIBLIOGRAPHIC ELEMENTS

TITLES

TITLE:

Use: There are elements to mark titles of texts that are referred to. The <title> tag goes around the actual title. Its type attribute can be used to further describe the type of title. A title element is most often contained in a <titledecl> (title declaration) element. A <titlegrp> (title group) element is used to combine various titles into a single group and can contain both a <titledecl> and a <titleinfo> tags. A <titleinfo> element is for listing variant titles from various places in the text in <titlediv>. For these see Variant Titles and Title Lists below.

Element: <title>, <titledecl>

Attributes: global and

lang: On the <title> element itself, the lang attribute should be set to the language code for the language of that title, e.g. “tib”. This code must be declared in the langUsage section of the metadata.

type: This can contain a descriptive “type” of title. This can be used for title elements within a <titledecl> that need to be distinguished. Provisional Values are: edition, volume, text, chapter, section

level: This is a TEI attribute with preset values that indicate the “level” (or kind) of title. It can be left blank. The values and their meaning are:

  • a – article
  • m – monograph
  • j – journal
  • s – series

*u – unpublished

  • none

Relevant Children: <foreign> (See title translation below.)

XSL Renders: italics.

Word Formatting: Title style, ti, underline.

TITLE TRANSLATIONS

Use: To include translations with the original titles, the use of the <foreign> element within the title element is recommended as opposed to using another <title> element with a different lang attribute value. Nesting the <foreign> element within the <title> element associates it with the title but labels it as a later “addition” not inherent in the text.

Element: <foreign>

Attributes: global and

lang: This should be set to the language code for the translation.

corresp: This can be set to the ID for the translator who should be declared in a responsibility statement in the metadata.

Relevant Children: <bibl> - can be used inside the <foreign> tag to reference the source of the translation if it is specific to one particular translation. Otherwise, the source should be declared in the metadata and the corresp attribute (an IDREF) is used.

XSL Renders: undetermined.

Word Formatting: in parentheses within the Title style (ti) of the title. E.g., “Garland of Views (lta ba’i phreng ba )”

VARIANT TITLES

Use: While the main text title can be contained in a titledecl for text mark-up. In cataloging situations, there is a need to record multiple variant titles from diverse parts of the text or outside the text (secondary literature and oral sources). In such cases, a <titleinfo> element is used with multiple <titlediv> each containing a specific instance of a variant title. The <titlediv> elements are differentiated by their type and subtype attributes. (Note: The <titledecl> is still used outside of the <titleinfo> to record the texts primary or normalized title.)

Element: <titleinfo>, <titlediv>

Attributes: global and

type: for the <titlediv> element there is a list of preset values that record the location or source of the title. It must be set to one of its possible values:

  • front
  • body
  • back
  • margin
  • oral
  • secondary
  • nontibet

subtype: also with the <titlediv> element, one can set a subtype value. The values for this attribute are not predefined but should be derived from a consistent typology. When cataloging a Tibetan text, it has been the THDL standard to use the section type here. Thus for a title line title, the <titlediv> would have a type=“front” and a subtype=“title line”. For text titles at the end of chapters, a single <titlediv> is used with type=“body” and subtype=“sections”, and this contains several <titlelists> for all the alternate text titles in the chapters.

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

TITLE LISTS

Use: Title lists are used to record a title that has multiple sources within a single text. The prime examples of this are text titles at the ends of chapters. Often, some chapters will use the same text title and others a different text title forming groups of sources for a particular text title. These are encoded using the <titlelist> element which contains a <titledecl> with the information on the title except for its source and a series of <titleitems> that record only chapter number and pagination.

Element: <titlelist>, <titledecl>, <titleitem>

Attributes: global

Relevant Children: none.

XSL Renders: none.

Word Formatting: none.

CITATIONS AND QUOTES

CITATIONS

Use: There are several elements meant to deal with citations or quotations with in a marked-up text. These are <cit> for a citation including the bibliographic information along with the citation itself, as described below. The <quote> tag is for an unreferenced quotation that does not include its source, while <q> tag is for spoken or thought words.

Element: <cit>, <quote>, or <q> elements. These are summarized as follows:

<cit> contains both the source bibliographic information and the quoted phrase or passage. The bibliographic information is placed within a <bibl> tag within the <cit> and the quote is placed within a <quote> (textual passage) or <q> (spoken or thought words) either before or after the <bibl>. Thus a full citation would for example be:

<cit>

<bibl>

<biblScope>The second chapter</biblScope> of

<title>The Secret Essence Tantra</title> says:

</bibl>

<quote>The elements are the goddesses—Locana and so forth.</quote>

</cit>

<quote> contains a phrase or passage attributed by the narrator or author to some agency external to the text. This is used within <cit> tags as above or it can stand alone when the source bibliographic information is not directly cited as with anonymous quotes. <q> also stands for quotation but refers to dialog or thoughts. Its three most important attributes are:

  • who: the speaker or thinker of the quote. Can take any value that identifies the person speaking or thinking the words.
  • type: the type of quote. Example values are “spoken” or “thought”.
  • direct: whether or not the quote is direct or indirect speech. Example values are “y”, “n”, “unspecified”.

Attributes: global

  • Also, see use above.

Relevant Children: See above.

XSL Renders: Undetermined.

Word Formatting: since citation can also be prose, verse, ordered lists or unordered lists, different styles are used to mark citations: citation prose (abbrev. cp), citation verse 1 (abbrev. cv1), citation verse (abbrev. cv2), citation list bullet (clb), and citation list number (cln).

RENDERING ELEMENTS

TYPES OF EMPHASIS

Use: To mark items which one wants to place a strong emphasis on a word, phrase or sentence. Ordinarily one might use bold or italic for such purposes, but by instead structurally tagging it as “emphasis”, one can decide at any point to render it in a different way without confusing it with other items which might have been formatted also as italics, bold, or whatever. A typology needs to be finalized on this. The rend attribute that is globally available can also be used for this purpose. The <hi> element should be used when some sub-portion of a text string needs to be emphasized. Otherwise, use the element’s rend attribute.

Element: <hi>

Attributes: global

rend: There needs to be a consistent typology for this. So far we have:

strong – for a strong emphasis the form of which can be determined by stylesheet

weak – for a weak emphasis the form of which can be determined by the stylesheet

italics – for a place where one want italics and only italics no variation.

bold – for a place where one wants bold with no variation.

underline – for a place where one wants underline with no variation.

background – for colored background. also use n attribute to give a color code for the background.

foreground – for highlighting that changes the text’s color. also use n attribute to give a color code for the text.

XSL Renders: The rend attribute determines a fixed rendering. e.g. <hi rend="bold> or the stylesheet determines it as with <hi rend=“strong”>.

Word Style: Emphasis strong (abbrev. es), Emphasis weak (abbrev. ew). We have not yet added: Italic style, it; Bold style, bo; Underline style, un.

LINKING

Note: We need to create automated hyperlinked references, so that references made to images, texts, etc. within THDL can use a unique identification that automatically creates a hyperlinked reference to that item within THDL. How can we make references within THDL objects within THDL – video, images, audio, bibliographical references, other texts, etc. – in a way that allows these references to later be automatically converted into hyperlinked connections that go straight into the resources. This is of course something that applies to diverse areas in THDL, such as field notes, photo-essays on places, or indeed any references to bibliographical resources. For example, QVXX might mark an image, with the number after it being the unique ID of that image; etc.

EXTERNAL LINKS

Use: To insert a link to a resource external to the XML document itself.

Element: <xptr> and <xref>. The difference between the two is that <xptr> is an empty tag that contains no text, whereas <xref> can contain text within it like this! </xref>. The link could then be created around that text. This is probably what we'd use more often. Relevant attributes are targType, which describes the target type, i.e. xml (or more specifically tibbibl), jpeg, movie, etc. The ID attribute can contain an id used for locating the item. Or the n attribute can be used for the necessary link information

Attributes: global

type: type of reference. Need a typology here too, possible values are:

  • text (i.e. xml)
  • audio
  • video
  • image
  • map
  • others?

targtype: this is the form that the target takes, such as “mp3”, “jpg”, “pdf”, “tibbibl” (i.e, the name of the DTD), etc.

n: This is used for the name (or ID) of the file without a path or extension, e.g. “Tb.708”. The path is supplied by the stylesheet and the extension by either the stylesheet or the targtype attribute.

resp: indicates the person responsible for encoding the link.

  • An example might be:

<xref n="Tb.708" resp="snw" type="text" targtype="tibbibl">Tb.708</xref>

  • or

<xptr n="Tb.708" resp="snw" type="text" targtype="tibbibl"/>

Relevant Children: none

XSL Renders: hyperlink, e.g. <a href=”http://….”> … </a>

Word Formatting: Word hyperlink

INTERNAL LINKS

Use: For links between sections of a document. These are navigational links for jumping around within a single document. They are similar to the external links in that there is a text enclosing version <ref>Text here</ref> and an empty-tag version <ptr/>. The linking mechanism should utilize the id attributes of structural elements within the text. The corresp attribute can contain this ID and the link can be made using it’s value. The <anchor> tag can be used to place an anchor within the text just as in HTML.

Elements: <ref>, <ptr>, <anchor>

Attributes: global

corresp: this is an IDref that must contain the value of an ID attribute somewhere else in the text. This insures that the link is valid

type: type of link

targtype: See above, not sure how useful this is for internal links.

n: Always available for secondary information or unrestricted use. This could be used instead of the corresp attribute, but that risks the possibility of broken links in the rendering.

resp: the editor responsible for encoding the link.

Relevant Children: none

XSL Renders: internal hyperlink. The tag with a referenced ID or the anchor tag must be rendered as an <a name=”element’s ID”/> element and the <ptr> or <ref> must be rendered as an <a href=”#element’s ID”> Go to Element now! </a>.

Word Formatting: hyperlink to bookmark in document

GLOBAL ATTRIBUTES

Certain element attributes are globally available on all elements. While a number of specific uses for these have been outlined above, the generally intended use as described by the TEI guidelines follows so that editors may use this information to deal with unforeseen situations, where one or another of these attributes may come in handy. There are four truly global attributes: id, n, lang, and rend.

ID

This attribute contains a unique identifier for the element bearing the ID value. If the value is not unique, the document will not validate.

N

This attribute contains a number (or other label) for an element, which is not necessarily unique within the document but adds to the identification or presentation of that element. Thus "n" is the name of the attribute to signify "number".

LANG (IDREFS #IMPLIED):

This attribute allows the user to indicate the language of the element’s containing text. It should be used on elements whose text is in the specified language, unless one of the element’s ancestors has declared the language with the lang attribute. Its value is an IDREF, which means that the language abbreviation used in this attribute must be declared in the top of the document within the <langUsage> element of the profile description. At present we are using ISO-639-2/B language codes in the lang attribute. The full description of this standard can be found at: external link: Standards.

Some of the codes relevant for Tibetan and Himalayan Studies are listed below in alphabetical order: (8)

LanguageISO-639-2/B CodeLanguageISO-639-2/B Code
AssameseasmMongolianmon
BengalibenNepalinep
BiharibihNewarinew
BurmeseburNorwegiannor
ChinesechiPrakrit languagespra
DanishdanRussianrus
Dutch (Flemish)dutSanskritsan
DzongkhadzoSinhalesesin
EnglishengSino-Tibetan (Other)sin
FrenchfreSogdiansog
GermangerSpanishspa
HimachalihimSwedishswe
HindihinTibetantib
HmonghmnUighuruig
HungarianhunUrduurd
Italianita  
Japanesejpn  
Kashmirikas  
Khotanesekho  
Koreankor  

REND

This attribute allows you to hard code the formatting for an element. For example, in the Tibetan Cataloging project the Discussion tag is used to deal with the object of homage and REND attribute specifies it should appear in-line for the catalog and not as a hyperlinked note. However one should be very carefully about hard coding formatting and treat this cautiously.

OTHER COMMON ATTRIBUTES

Other attributes commonly found for most elements follow. Only one of them is used in the specifications above. That is “corresp” which contains a value that corresponds to an ID value elsewhere in the text. This is for internal linking and reference. Use of the other attributes has not yet been explored. However, there definition is such that they are all “IDREFS”, which means they, like corresp must contain a value that is equal to the ID value of another element or the document will not validate. All of them may be left blank. To explore possible uses, check the external link: TEI Guidelines.

corresp IDREFS #IMPLIED:

synch IDREFS #IMPLIED:

sameAs IDREF #IMPLIED:

copyOf IDREF #IMPLIED:

next IDREF #IMPLIED:

prev IDREF #IMPLIED:

exclude IDREFS #IMPLIED:

select IDREFS #IMPLIED:

ana IDREFS #IMPLIED:

Notes

(1) “Breadcrumbs” are the links in the bar below the menus on a THDL page that show where the page is located in the hierarchy of the website. they are separated by > and are ordered from the root THDL to the page.

(2) Line-breaks have been inserted to maintain this HTML page width; they are not necessary in the XML.

(3) Strictly speaking, one should include at the top of an XML document an entity reference to each external document it links to, but because of practical considerations, we are not doing it that way.

(4) We should, in the near future, develop a xtib XML-Schema as a working companion of the DTD, since schemas are likely to replace DTDs.

(5) The word “element” is used interchangeably here with “tags”, though conceptually there is a difference. An XML element consists of an opening tag, containing attributes text or children elements, and ending with a closing tag. Thus, “<p> This is a paragraph for <name>Joe</name>.</p>” is a paragraph element with a child name element, while <p>, <name>,</p>, and </name> are by themselves all tags. However, in this document, “element” and “tags” are used synonymously.

(6) In many Tantric commentaries, the “Extraordinary introductory scene” is the introductory Sanskrit phrase, eva.

(7) In the TEI guidelines, the “resp” attribute is used somewhat differently. If one is marking up a manuscript that has multiple editors who made additions, the resp attribute identifies the electronic editor who decides which of the multiple editors of the manuscript made the addition. The “hand” attribute identifies the editor, or hand, who made the addition.

(8) This is an ad hoc list taken from the ISO-639-2/B specification. It is a compilation of European, Himalayan, and Asian languages that might be used in either secondary scholarship or original source materials.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library