Checking Converted Bibliographic Records

THL Toolbox > Tibetan Texts > Cataloging a Tibetan Text > Converting Entries > Proofreading Converted Bibliographic Records

Proofreading Converted XML Bibliographic Records

Contributor(s): THL Staff.

Once the bibliographic records have been converted into XML files, they need to be checked and cleaned up. This involves the following tasks:

Opening up each XML record in an XML editor and making sure it validates against the DTD. If it does not, then correcting problems in the document until it validates, or else try reconverting.
Looking at the bottom of the document for comments and notes left by the cataloger. Some of these comments may be information that needs to be entered into the XML as a note or discussion. All the cells in the third column of the entry table get converted into a comment at the bottom of the XML file. Each row's comment is preceded by the label in the first cell of the row, and the comments for different rows are separated by a row of asterisks. Some of the comments will be to add information in the markup, such as pagination for multi-volume texts. Others may be specific instructions to the converter or proofer, such as to check the accuracy of some information.
Once a volume of XML records has been sufficiently cleaned-up, the entire volume should be zipped into a single file called N-Kg-v###-xml.zip and uploaded to that volume's folder in the Canons Resources page.

Markup for Notes

Insert a <note> tag with the following attributes <note id="N-0004-bib-n1" resp="snw" type="translators">

- id: the name of the file with xml removed and replaced with n1 for the first note in the file, n2 for the second note in the file, etc.
- resp: your three-letter initials.
- type: the name of the tag to which this note refers. If the note is about the translators, then enter translators; if the note is about the colophon, then enter colophon; etc.
- Placement of note: insert the note just before the close tag of the section to which it refers (<note> is allowed in some places and not others, so you may have to experiment to find where it is allowed for a particular tag; keep a list of difficulties). For example, a note on the translators would go here:

<origination>

      <head>Provenance</head>

      <respdecl type="translator">

         <persName n="Indian Scholar" lang="tib" key=""><!--'phags pa gzhi thams cad yod par smra ba'i 'dul ba 'dzin pa kha che'i bye brag tu smra ba'i slob dpon dzi na mi tra/-->འཕགས་པ་གཞི་ཐམས་ཅད་ཡོད་པར་སྨྲ་བའི་འདུལ་བ་འཛིན་པ་ཁ་ཆེའི་བྱེ་བྲག་ཏུ་སྨྲ་བའི་སློབ་དཔོན་ཛི་ན་མི་ཏྲ།</persName>
 
        <persName n="Tibetan Translator" lang="tib" key=""><!--zhu chen gyi lo ts+tsha ba ban de klu'i rgyal mtshan/-->ཞུ་ཆེན་གྱི་ལོ་ཙྪ་བ་བན་དེ་ཀླུའི་རྒྱལ་མཚན།</persName>
 
     </respdecl>

     <note id="N-0004-bib-n1" resp="snw" type="translators">This translator team was responsible for translating the first half of the text.</note>

   </origination>

Correct problems that occurred with the Wylie to Unicode Tibetan conversion. In Tibetan fields there may occur phrases of Wylie in brackets in the middle of the Tibetan. These are chunks of Wylie that did not convert and need to be fixed. When this occurs it is usually in Sanskrit titles and is most often the result of incorrect THL Extended Wylie (such as forgetting to add a + between letters in non-standard stacks) in the Word doc. Much less frequently this results because the stack has not yet been created in Tibetan Machine Uni.
Example: ཨཱ་[rya]་པ་ཉྩ་བི་[ngsha]་ཏི་ཀ་པྲ་ཛྙཱ་པཱ་ར་མི་ཏཱ་མུ་ཁ་ནཱ་མ་མ་ཧཱ་ཡཱ་ན་སཱུ་ཏྲ།
Solution: manually create ཪྱ and ངྴ and replace the Wylie (including the square brackets) with them. Also make the corresponding correction to the Wylie within the comment (<! >) tag. Note: be sure there is only one tsheg between the text you correct and the syllable that follows it.

Markup for Discussion, Physfacet

Data in the Word general discussion field dealing with page numbering issues should have the following XML markup:

<physfacet type="Distinctive features">Page 619 is misnumbered 618. (snw)</physfacet>

This goes in the XML file here:

<physfacet lang="tib" type="Script">

     <!--dbu can/-->

     དབུ་ཅན།
</physfacet>
<physfacet type="Distinctive features">Page 619 is misnumbered 618. (snw)</physfacet>

The following list is one of various markup issues and how they are resolved in XML. These problems may result in the XML document not validating. So, if your document doesn't validate, check the list below. (It will gradually get longer as further questions come up.)

Master ID Issues

(problem with Narthang Converter prior to 3/26/2007): the Master ID #, which in the Canons Project is garnered from Phil Stanley's database, is marked up in an <idno> element that follows immediately after the <tibid> … </tibid> that provides unique identification information for that version of that text, preceding any other <idno> that is recorded in the Tibbibl record. The idno element must have a type attribute set to "master". Thus, in the Kangyur it would appear as follows:

… </tibid>
<idno type="master">0014</idno>
<idno type="eimer">14</idno>

Note: In versions of the Narthang converter that pre-dated 3-26-2007, this information was dropped. Any records converter prior to that date should be checked and the information entered by hand. The information can be found in the Kangyur and Tengyur Title lists.

Texts Spanning Multiple Volumes

About 60 texts span multiple volumes. In such cases, the cataloger will insert references to the ending volume in the third column. This will create notes in the comment at the end of the XML document, which will need to be converted by hand into XML, according to these instructions:

volume information: this markup follows the comment "VOLUME INFO" and looks like this:

<!--VOLUME INFO-->
<tibid type="volume" system="number">24
     <altid system="letter" lang="tib"><!--nga-->ང</altid>
     <tibid type="text" system="number">3</tibid>
</tibid>

<tibid type="volume"> is the sequential volume number within the edition and is the means for connecting to the volume bibliographic record; in this example, it is the 24th volume in the edition. The letter ID (<altid system="letter">) generally restarts with each genre and therefore is not unique; it is included for reference purposes. In this example, it is the fourth volume of the genre. The text number (<tibid type="text">) refers to the number of the text within the volume; in this example, the text is the third text in the volume.

When a text begins in one volume and ends in another volume, this markup should be repeated for each volume. Assign an "n" attribute to each <tibid type="volume"> tag, indicating which volume it is in the sequence of the volumes that contain this text: "n="1" indicates the first volume in which the text occurs; n="2" indicates the second volume in which the text occurs; n="3" indicates the third volume in which the text occurs; etc. You also need to enter this number in the VOLUME INFO comment. Example: for a text the beginning of which is the third text in volume 24 and which ends as the first text in volume 26, the markup would be:

<!--VOLUME 1 INFO-->
<tibid type="volume" system="number" n="1">24
     <altid system="letter" lang="tib"><!--nga-->ང</altid>
     <tibid type="text" system="number">3</tibid>
</tibid>
<!--VOLUME 2 INFO-->
<tibid type="volume" system="number" n="2">25
     <altid system="letter" lang="tib"><!--ca-->ཅ</altid>
     <tibid type="text" system="number">1</tibid>
</tibid>
<!--VOLUME 3 INFO-->
<tibid type="volume" system="number" n="3">26
     <altid system="letter" lang="tib"><!--cha-->ཆ</altid>
     <tibid type="text" system="number">1</tibid>
</tibid>

pagination and extent: these will need to be modified for texts that span multiple volumes. The standard markup for the page range of a text is:

<pagination type="block">
      <num n="begin">262b.1</num>
      <num n="end">645b.7</num>
</pagination>

For multi-volume texts, there needs to be a list of page ranges for each volume. This is achieved by wrapping each pair of beginning and ending <num> elements within a parent <num> element whose type attribute is set to "volume" and whose n attribute is set to the number of the volume within the sequence of volumes in which the text occurs, as follows:

<pagination type="block">
     <num n="1" type="volume">
          <num n="begin">262b.1</num>
          <num n="end">653b.7</num>
     </num>
     <num n="2" type="volume">
          <num n="begin">1a.1</num>
          <num n="end">703b.7</num>
     </num>
     <num n="3" type="volume">
          <num n="begin">1a.1</num>
          <num n="end">453a.3</num>
     </num>
</pagination>

Here, <num n="1" type="volume"> indicates that this is the pagination of the text in the volume in which the text begins; <num n="2" type="volume"> indicates that this is pagination of the text within the second volume in which it occurs; etc.

As for the extent of the text, this will have to be calculated by adding up all the paginations (full or partial) for each volume into a total number of sides. This figure should be correct if the cataloger added all the sides of the volumes other than the first volume and entered this figure in the page "differential field" of the Word entry form. In the XML file there should be a note associated with the "page differential" field indicating how the figure in the "page differential" field was calculated.

Pagination Issues

Most paginations go within a tag by the same name. However, in a few situations the <source> element is used. In either case, there are two types of paginations that are marked up slightly differently:

if an item falls within a single page and line, that page and line reference simply goes within the pagination element, as in:

<pagination n="bound">262b.2</pagination>

a page range is marked up as two <num> elements within the parent pagination element, as follows:

<pagination type="block">
     <num n="begin">262b.1</num><num n="end">264a.4</num>
</pagination>

Note: this differs from how paginations are entered in the Word entry form (e.g., "262b.1-264a.4"). The converter eliminates the dash and adds the markup.

Provenance Issues

Provenance refers to the people who were involved in creating the artifact (text) being cataloged, from the author to the printer of the particular edition and all those in between, as well as other data about its creation such as the place, date, etc. In the Tibbibl markup, this information is wrapped in <origination> … </origination> tags that contain <respDecl>. The issues below deal with this section of the markup.

empty provenance ("origination") will not validate: if there is no colophon information for a text, there will be no "origination" (provenance) information either. This will make the resulting XML file NOT validate because it will contain an empty origination element that will look like this:

<origination>
     <head>Provenance</head>
</origination>

If you delete this markup, then the XML will validate.

Furthermore, since this means there is no colophonic information, the markup for the colophon should also be deleted. It looks like:

<physdecl rend="Colophon" n="colophon">
     <head>Colophon</head>
     <discussion type="Colophon" n="contents" lang="tib"><!----></discussion>
     <pagination type="Colophon" n="block"/>
</physdecl>

If the tags are empty as above, they should be deleted. If you have any questions, contact Steve or Than.

Additions Made to the Tibetan Text after Original Carving

When an addition has been made to a Tibetan text – indicated by three or four dots that connect the inserted material to the place it was omitted, much like an annotation – the cataloger will add the XML markup below in the Word doc. The proofreader needs to check this to make sure it is correct. An example of this from a Tibetan text is:

Screen shot of material added to a Tibetan text

In the bottom line, a smaller མ was added below and slightly to the left of the regular-sized མ probably because the second མ was mistakenly omitted when the block was originally carved. If this represented the ligature མྨ་ then there would not be space between the two letters and they would be directly above/below each other. Markup:

ནཱ་མ་<add place="infralinear" resp="editor">མ་</add>ཧཱ་ཡཱ་ན་…

In the unlikely event that you know the name of the person responsible for making the addition to the Tibetan text, enter that rather than "editor." Also, select the value of the place attribute from the following list:

inline addition is made in a space left in the witness by an earlier scribe
supralinear addition is made above the line
infralinear addition is made below the line
left addition is made in left margin
right addition is made in right margin
top addition is made in top margin
bottom addition is made in bottom margin
opposite addition is made on opposite page
verso addition is made on verso of sheet
mixed addition is made somewhere, one or more of other values

Individual Data Fields

non-tib title in tibetan: if there is no non-Tibetan title given in the text, the cataloger will enter
Not specified.
In this case, make sure that lang="eng". The XML for such cases looks like this:

<title lang="eng" type="nontibet"><!---->Not specified.</title>

original language: if there is no original language specified in the text, the cataloger will enter
Not specified.
In this case, make sure that lang="eng". The XML for such cases looks like this:

<note type="original language" lang="eng" place="unspecified" anchored="yes">
<!---->
Not specified.</note>

If there are multiple original languages, just make two different <titlediv> fields, nested within the <titleinfo> and <titlegrp> fields, and fill out the appropriate information for both languages. Change the <titlediv subtype=""> for the new language as well. Here is an example using Sanskrit and external link: bru zha:

<titlediv type="nontibet" subtype="sanskrit" lang="san">
            <titledecl>
               <title lang="tib" type="nontibet"><!--sa rba ta thA ga ta tsi ta ta dz+nyA na gu h+ya r+tha ga r+b+ha bU ha badz+ra tan+t+ra si d+d+hi yo ga a ga ma sa mA dza sa rba bi d+ya sU tra ma hA yA na sa b+hi sa ma ya d+ha rma pa r+ya ya bi byU ha nA ma sU traM/-->ས་རྦ་ཏ་ཐཱ་ག་ཏ་ཙི་ཏ་ཏ་ཛྙཱ་ན་གུ་ཧྱ་རྠ་ག་རྦྷ་བཱུ་ཧ་བཛྲ་ཏནྟྲ་སི་དྡྷི་ཡོ་ག་ཨ་ག་མ་ས་མཱ་ཛ་ས་རྦ་བི་དྱ་སཱུ་ཏྲ་མ་ཧཱ་ཡཱ་ན་ས་བྷི་ས་མ་ཡ་དྷ་རྨ་པ་རྱ་ཡ་བི་བྱཱུ་ཧ་ནཱ་མ་སཱུ་ཏྲཾ།</title>
               <title lang="san" type="nontibet">sarbatathāgatacitatajñānaguhyarthagarbhabūhabajratantrasiddhiyogaagamasamājasarbavidyasūtramahāyānasabhisamayadharmaparyayavibyūhanāmasūtraṃ</title>
               <title type="normalized" lang="san">sarvatathāgatacittajñānaguhyārthagarbhavyūhavajratantrasiddhiyogāgamasamājasarvavidyāsūtramahāyānābhisamayadharmaparyāyavivyūha-nāma-sūtra</title>
               <note type="original language" lang="tib" place="unspecified" anchored="yes"><!--rgya gar skad/-->རྒྱ་གར་སྐད།</note>
            </titledecl>
            <pagination>
               <num n="begin">120b.4</num>
               <num n="end">120b.5</num>
            </pagination>
         </titlediv>
	 <titlediv type="nontibet" subtype="bru zha">
            <titledecl>
               <title lang="tib" type="nontibet"><!--hon pa ni ral til pi bu bi til ti ta sing 'un 'ub hang pang ril la 'ub pi su bang ri zhe hal pa'i ma kyang gu'i dang rod ti/-->ཧོན་པ་ནི་རལ་ཏིལ་པི་བུ་བི་ཏིལ་ཏི་ཏ་སིང་འུན་འུབ་ཧང་པང་རིལ་ལ་འུབ་པི་སུ་བང་རི་ཞེ་ཧལ་པའི་མ་ཀྱང་གུའི་དང་རོད་ཏི།</title>
               <note type="original language" lang="tib" place="unspecified" anchored="yes"><!--bru zha'i skad/-->བྲུ་ཞའི་སྐད།</note>
            </titledecl>
            <pagination>
               <num n="begin">120b.5</num>
               <num n="end">120b.6</num>
            </pagination>
         </titlediv>

Author's Colophon: As with all chapter-level elements, an author’s colophon is marked by <div2> tags. They are distinguished by their type attributes. Thus, the author’s colophon is marked by <div2 type= "author's colophon"> tags.