Contributor(s): THL Staff.
Once the bibliographic records have been converted into XML files, they need to be checked and cleaned up. This involves the following tasks:
Insert a <note> tag with the following attributes <note id="N-0004-bib-n1" resp="snw" type="translators">
<origination> <head>Provenance</head> <respdecl type="translator"> <persName n="Indian Scholar" lang="tib" key=""><!--'phags pa gzhi thams cad yod par smra ba'i 'dul ba 'dzin pa kha che'i bye brag tu smra ba'i slob dpon dzi na mi tra/-->འཕགས་པ་གཞི་ཐམས་ཅད་ཡོད་པར་སྨྲ་བའི་འདུལ་བ་འཛིན་པ་ཁ་ཆེའི་བྱེ་བྲག་ཏུ་སྨྲ་བའི་སློབ་དཔོན་ཛི་ན་མི་ཏྲ།</persName> <persName n="Tibetan Translator" lang="tib" key=""><!--zhu chen gyi lo ts+tsha ba ban de klu'i rgyal mtshan/-->ཞུ་ཆེན་གྱི་ལོ་ཙྪ་བ་བན་དེ་ཀླུའི་རྒྱལ་མཚན།</persName> </respdecl> <note id="N-0004-bib-n1" resp="snw" type="translators">This translator team was responsible for translating the first half of the text.</note> </origination>
Data in the Word general discussion field dealing with page numbering issues should have the following XML markup:
<physfacet type="Distinctive features">Page 619 is misnumbered 618. (snw)</physfacet>
This goes in the XML file here:
<physfacet lang="tib" type="Script"> <!--dbu can/--> དབུ་ཅན། </physfacet> <physfacet type="Distinctive features">Page 619 is misnumbered 618. (snw)</physfacet>
The following list is one of various markup issues and how they are resolved in XML. These problems may result in the XML document not validating. So, if your document doesn't validate, check the list below. (It will gradually get longer as further questions come up.)
(problem with Narthang Converter prior to 3/26/2007): the Master ID #, which in the Canons Project is garnered from Phil Stanley's database, is marked up in an <idno> element that follows immediately after the <tibid> … </tibid> that provides unique identification information for that version of that text, preceding any other <idno> that is recorded in the Tibbibl record. The idno element must have a type attribute set to "master". Thus, in the Kangyur it would appear as follows:
… </tibid> <idno type="master">0014</idno> <idno type="eimer">14</idno>
Note: In versions of the Narthang converter that pre-dated 3-26-2007, this information was dropped. Any records converter prior to that date should be checked and the information entered by hand. The information can be found in the Kangyur and Tengyur Title lists.
About 60 texts span multiple volumes. In such cases, the cataloger will insert references to the ending volume in the third column. This will create notes in the comment at the end of the XML document, which will need to be converted by hand into XML, according to these instructions:
<!--VOLUME INFO--> <tibid type="volume" system="number">24 <altid system="letter" lang="tib"><!--nga-->ང</altid> <tibid type="text" system="number">3</tibid> </tibid>
<tibid type="volume"> is the sequential volume number within the edition and is the means for connecting to the volume bibliographic record; in this example, it is the 24th volume in the edition. The letter ID (<altid system="letter">) generally restarts with each genre and therefore is not unique; it is included for reference purposes. In this example, it is the fourth volume of the genre. The text number (<tibid type="text">) refers to the number of the text within the volume; in this example, the text is the third text in the volume.
When a text begins in one volume and ends in another volume, this markup should be repeated for each volume. Assign an "n" attribute to each <tibid type="volume"> tag, indicating which volume it is in the sequence of the volumes that contain this text: "n="1" indicates the first volume in which the text occurs; n="2" indicates the second volume in which the text occurs; n="3" indicates the third volume in which the text occurs; etc. You also need to enter this number in the VOLUME INFO comment. Example: for a text the beginning of which is the third text in volume 24 and which ends as the first text in volume 26, the markup would be:
<!--VOLUME 1 INFO--> <tibid type="volume" system="number" n="1">24 <altid system="letter" lang="tib"><!--nga-->ང</altid> <tibid type="text" system="number">3</tibid> </tibid> <!--VOLUME 2 INFO--> <tibid type="volume" system="number" n="2">25 <altid system="letter" lang="tib"><!--ca-->ཅ</altid> <tibid type="text" system="number">1</tibid> </tibid> <!--VOLUME 3 INFO--> <tibid type="volume" system="number" n="3">26 <altid system="letter" lang="tib"><!--cha-->ཆ</altid> <tibid type="text" system="number">1</tibid> </tibid>
<pagination type="block"> <num n="begin">262b.1</num> <num n="end">645b.7</num> </pagination>
For multi-volume texts, there needs to be a list of page ranges for each volume. This is achieved by wrapping each pair of beginning and ending <num> elements within a parent <num> element whose type attribute is set to "volume" and whose n attribute is set to the number of the volume within the sequence of volumes in which the text occurs, as follows:
<pagination type="block"> <num n="1" type="volume"> <num n="begin">262b.1</num> <num n="end">653b.7</num> </num> <num n="2" type="volume"> <num n="begin">1a.1</num> <num n="end">703b.7</num> </num> <num n="3" type="volume"> <num n="begin">1a.1</num> <num n="end">453a.3</num> </num> </pagination>
Here, <num n="1" type="volume"> indicates that this is the pagination of the text in the volume in which the text begins; <num n="2" type="volume"> indicates that this is pagination of the text within the second volume in which it occurs; etc.
As for the extent of the text, this will have to be calculated by adding up all the paginations (full or partial) for each volume into a total number of sides. This figure should be correct if the cataloger added all the sides of the volumes other than the first volume and entered this figure in the page "differential field" of the Word entry form. In the XML file there should be a note associated with the "page differential" field indicating how the figure in the "page differential" field was calculated.
Most paginations go within a tag by the same name. However, in a few situations the <source> element is used. In either case, there are two types of paginations that are marked up slightly differently:
<pagination n="bound">262b.2</pagination>
<pagination type="block"> <num n="begin">262b.1</num><num n="end">264a.4</num> </pagination>
Note: this differs from how paginations are entered in the Word entry form (e.g., "262b.1-264a.4"). The converter eliminates the dash and adds the markup.
Provenance refers to the people who were involved in creating the artifact (text) being cataloged, from the author to the printer of the particular edition and all those in between, as well as other data about its creation such as the place, date, etc. In the Tibbibl markup, this information is wrapped in <origination> … </origination> tags that contain <respDecl>. The issues below deal with this section of the markup.
<origination> <head>Provenance</head> </origination>
If you delete this markup, then the XML will validate.
Furthermore, since this means there is no colophonic information, the markup for the colophon should also be deleted. It looks like:
<physdecl rend="Colophon" n="colophon"> <head>Colophon</head> <discussion type="Colophon" n="contents" lang="tib"><!----></discussion> <pagination type="Colophon" n="block"/> </physdecl>
If the tags are empty as above, they should be deleted. If you have any questions, contact Steve or Than.
When an addition has been made to a Tibetan text – indicated by three or four dots that connect the inserted material to the place it was omitted, much like an annotation – the cataloger will add the XML markup below in the Word doc. The proofreader needs to check this to make sure it is correct. An example of this from a Tibetan text is:
In the bottom line, a smaller མ was added below and slightly to the left of the regular-sized མ probably because the second མ was mistakenly omitted when the block was originally carved. If this represented the ligature མྨ་ then there would not be space between the two letters and they would be directly above/below each other. Markup:
ནཱ་མ་<add place="infralinear" resp="editor">མ་</add>ཧཱ་ཡཱ་ན་…
In the unlikely event that you know the name of the person responsible for making the addition to the Tibetan text, enter that rather than "editor." Also, select the value of the place attribute from the following list:
inline addition is made in a space left in the witness by an earlier scribe
supralinear addition is made above the line
infralinear addition is made below the line
left addition is made in left margin
right addition is made in right margin
top addition is made in top margin
bottom addition is made in bottom margin
opposite addition is made on opposite page
verso addition is made on verso of sheet
mixed addition is made somewhere, one or more of other values
non-tib title in tibetan: if there is no non-Tibetan title given in the text, the cataloger will enter
Not specified.
In this case, make sure that lang="eng". The XML for such cases looks like this:
<title lang="eng" type="nontibet"><!---->Not specified.</title>
original language: if there is no original language specified in the text, the cataloger will enter
Not specified.
In this case, make sure that lang="eng". The XML for such cases looks like this:
<note type="original language" lang="eng" place="unspecified" anchored="yes"> <!----> Not specified.</note>
If there are multiple original languages, just make two different <titlediv> fields, nested within the <titleinfo> and <titlegrp> fields, and fill out the appropriate information for both languages. Change the <titlediv subtype=""> for the new language as well. Here is an example using Sanskrit and bru zha:
<titlediv type="nontibet" subtype="sanskrit" lang="san"> <titledecl> <title lang="tib" type="nontibet"><!--sa rba ta thA ga ta tsi ta ta dz+nyA na gu h+ya r+tha ga r+b+ha bU ha badz+ra tan+t+ra si d+d+hi yo ga a ga ma sa mA dza sa rba bi d+ya sU tra ma hA yA na sa b+hi sa ma ya d+ha rma pa r+ya ya bi byU ha nA ma sU traM/-->ས་རྦ་ཏ་ཐཱ་ག་ཏ་ཙི་ཏ་ཏ་ཛྙཱ་ན་གུ་ཧྱ་རྠ་ག་རྦྷ་བཱུ་ཧ་བཛྲ་ཏནྟྲ་སི་དྡྷི་ཡོ་ག་ཨ་ག་མ་ས་མཱ་ཛ་ས་རྦ་བི་དྱ་སཱུ་ཏྲ་མ་ཧཱ་ཡཱ་ན་ས་བྷི་ས་མ་ཡ་དྷ་རྨ་པ་རྱ་ཡ་བི་བྱཱུ་ཧ་ནཱ་མ་སཱུ་ཏྲཾ།</title> <title lang="san" type="nontibet">sarbatathāgatacitatajñānaguhyarthagarbhabūhabajratantrasiddhiyogaagamasamājasarbavidyasūtramahāyānasabhisamayadharmaparyayavibyūhanāmasūtraṃ</title> <title type="normalized" lang="san">sarvatathāgatacittajñānaguhyārthagarbhavyūhavajratantrasiddhiyogāgamasamājasarvavidyāsūtramahāyānābhisamayadharmaparyāyavivyūha-nāma-sūtra</title> <note type="original language" lang="tib" place="unspecified" anchored="yes"><!--rgya gar skad/-->རྒྱ་གར་སྐད།</note> </titledecl> <pagination> <num n="begin">120b.4</num> <num n="end">120b.5</num> </pagination> </titlediv> <titlediv type="nontibet" subtype="bru zha"> <titledecl> <title lang="tib" type="nontibet"><!--hon pa ni ral til pi bu bi til ti ta sing 'un 'ub hang pang ril la 'ub pi su bang ri zhe hal pa'i ma kyang gu'i dang rod ti/-->ཧོན་པ་ནི་རལ་ཏིལ་པི་བུ་བི་ཏིལ་ཏི་ཏ་སིང་འུན་འུབ་ཧང་པང་རིལ་ལ་འུབ་པི་སུ་བང་རི་ཞེ་ཧལ་པའི་མ་ཀྱང་གུའི་དང་རོད་ཏི།</title> <note type="original language" lang="tib" place="unspecified" anchored="yes"><!--bru zha'i skad/-->བྲུ་ཞའི་སྐད།</note> </titledecl> <pagination> <num n="begin">120b.5</num> <num n="end">120b.6</num> </pagination> </titlediv>
Author's Colophon: As with all chapter-level elements, an author’s colophon is marked by <div2> tags. They are distinguished by their type attributes. Thus, the author’s colophon is marked by <div2 type= "author's colophon"> tags.