General Cataloging Principles

THL Toolbox > Tibetan Texts > Canons Cataloging Portal > General Cataloging Principles

General Cataloging Principles

Contributor(s): Nathanial Grove, Steven Weinberger

General Principles

  • Cataloging Status Reports: when you begin to work on a volume, update the Cataloging Status Report for the edition you are working on. When you finish working on a volume, update the status page. It is very important to update the status since there are many people working on this project at different locations and it is imperative that everyone knows exactly what work has been finished and what work is in progress. When you are working on a volume, enter your three-letter initials + working in the column for that phase of work; when you have finished a volume, enter your three-letter initials and the date, in the format YYYY-MM-DD
  • Zip the Files: after all text catalog records and the volume catalog record are completed for the volume, zip all the Word files (Word text catalog files and Word volume catalog file) for a single volume into a single zip file. Name this file Sigla-Section-v###.zip, where Sigla=the sigla for the edition, Section=Kg or Tg (Kg for Kangyur; Tg for Tengyur). Example: the zip file containing all the Word text catalog records and the Word volume catalog record for the ninth volume of the Nartang Kangyur is named: N-Kg-v009.zip
  • Upload the Zip Fle to its appropriate folder in Canons Resources > Cataloging Resources. Example: upload the zip file N-Kg-v009.zip, which contains all the Word text catalog records and the Word volume catalog record for the ninth volume of the Nartang Kangyur, into the folder Canons Resources > Cataloging Resources > Nartang Kangyur-Tengyur > Nartang Kangyur > N-Kg-v009-'dul ba
  • Update Cataloging Status Report: go to the appropriate status report for the edition you are cataloging (Kangyur-Tengyur Cataloging Status Reports) and, in both the Text Cataloging Records table and the Volume Cataloging Records tables, find the line for the volume you just completed and in the "Cataloging" column enter your three-letter initials + date (note: format for date is YYYY-MM-DD)
  • Cataloger and Proofer: for a given text, the cataloger and the proofer must be different people. The same person cannot both catalog and proofread a given text record.
  • Work from a Hard Copy: catalogers will enter data in a Word doc that has been imported from the database and proofreaders proofread XML files that have been converted from the Word doc the catalogers complete. Both catalogers and proofers must print out the files before they perform their respective tasks. NEVER check the text itself against a computer file; always check it against a printout of the computer file.
  • Procedure for checking a file against the text itself: always read the text itself first and then check the corresponding data in the printout of the computer file. Never read the printout of the computer file first and then check it against the text. This is important because the order in which you do this significantly affects the accuracy of your work. Also, do not try to proof too much text at a time; read a short string of text in the text, check the corresponding text on the printout, check another short string from the text itself, check the printout, etc.
  • Proofers: if there is an error in text in Tibetan Machine Uni font, check the Word doc to see if the error is there also. If what's in the Word doc is different than what's in the XML doc, this means the error was introduced in the conversion process. Document all such errors in the TMU Conversion Errors page.
    • When you correct text in Tibetan Machine Uni font you must also make the correction in the Wylie in the comment tag (<!-- -->) just above it in the XML file.
  • Dates: All dates are to be entered in the following format: YYYY-MM-DD.
  • Because there are certain resources such as the Otani online catalog that contain information in Wylie, all Tibetan is entered in THL's Extended Wylie and then in the conversion process it is converted to Unicode Tibetan. If one is not using any such resources, Tibetan data can be entered directly in Unicode. But for each edition the entry format must be consistent.
  • Entering Tibetan-language data: all Tibetan-language data is entered using the external link: THL Extended Wylie transliteration system. There are also two lists of standard Tibetan stacks: external link: THL list of standard Tibetan stacks and external link: Chris Fynn's table of standard Tibetan stacks
  • Making textual emendations: if there is an obvious error in the text, use the following markup: Use curly braces { } around the original text, and embed the corrected text in square brackets . Example: the text reads bad+yi but should read bid+ya. Markup in Word doc: {bad+yi[bid+ya]}.
    To make a textual emendation in the XML file, use this markup:
    <corr sic="བདྱི" resp="snw">བིདྱ་</corr>
  • Unclear text for text that is unclear, use the <unclear> element as follows:
    <unclear reason="smudged" cert="40%" resp="snw">rgyan </unclear>. Insert your three-letter initials as the value of the resp attribute. For this example, the cataloger is Steve Weinberger, whose initials are snw. Also, insert a percentage of certainty in the reading. This is a rough estimate, in this case 40%. Finally, insert one of the following as the reason the text is unclear:
    • faded
    • low ink
    • smudged
    • vowel unclear
      If you feel confident you know what the unclear syllable(s) are, then use the <corr> element as follows. The example details the markup for a case in which the text incorrectly reads rgyan but should read rgyal:
      <unclear reason="smudged" cert="75%" resp="snw"><corr sic="rgyan" resp="snw">rgyal </corr></unclear>
      Note: include the space (which represents the tsheg) after each syllable. In such cases and in general, the 'cert' attribute on the <unclear> refers to how certain one is about the actually reading, and not the correction. Corrections should only be made when one is absolutely confident of the change.
  • Illegible text: for text that is illegible, use the <damage> element as follows. In this example, five syllables are illegible:
    <damage extent="5 syllables" degree="100%" resp="snw" type="smudged"></damage>
    Note that you insert your three-letter initials as the value of the resp attribute. For this example, the cataloguer is Steve Weinberger, whose initials are snw. The degree should always be 100%. If it's less than 100% then treat it as unclear text and use the <unclear> element.
    Insert one of the following as the type:
    • faded
    • low ink
    • smudged
    • torn
    • vowel unclear
      If you feel confident you know what the illegible syllable(s) are, then use the <supplied> element as follows:
      <damage extent="5 syllables" degree="100%" resp="snw" type="smudged"><supplied resp="snw">mngon par shes pa'i blo </supplied></damage>
      Note: be sure to include a space (representing the tsheg) after the final syllable.
  • Additions made to the Tibetan text: when an addition has been made to a Tibetan text – indicated by three or four dots that connect the inserted material to the place it was omitted, much like an annotation – add the XML markup below in the Word doc. An example of this from a Tibetan text is:

Screen shot of material added to a Tibetan text

In the bottom line, a smaller was added below and slightly to the left of the regular-sized probably because the second was mistakenly omitted when the block was originally carved. If this represented the ligature མྨ་ then there would not be space between the two letters and they would be directly above/below each other. Markup:

ནཱ་མ་<add place="infralinear" resp="editor">མ་</add>ཧཱ་ཡཱ་ན་

In the unlikely event that you know the name of the person responsible for making the addition to the Tibetan text, enter that rather than "editor." Also, select the value of the place attribute from the following list:


inline addition is made in a space left in the witness by an earlier scribe
supralinear addition is made above the line
infralinear addition is made below the line
left addition is made in left margin
right addition is made in right margin
top addition is made in top margin
bottom addition is made in bottom margin
opposite addition is made on opposite page
verso addition is made on verso of sheet
mixed addition is made somewhere, one or more of other values

  • References to other Kangyur-Tengyur text catalog records: if in a note or discussion you refer to other Kangyur or Tengyur text catalog records, use the following format: THL-KT-sigla-####. Example: This is the seventh of forty-nine texts that comprise the Ratnakūṭa-sūtra (THL-KT-N-0032). (ajm)
  • References to texts: if in a note or discussion you refer to a text title, add the following markup in the Word doc (examples of both a Sanskrit and Tibetan title are given):
    <title lang="san" level="m" type="text">Ratnakūṭa-sūtra</title>
    <title lang="tib" level="m" type="text">dpal mchog dang po/</title>
  • alt+d: Runs a macro to convert Sanskrit titles that use capital letters for diacritics to Unicode diacritics
  • alt+t: Runs a macro to convert the Otani Wylie-esque transliteration into true extended Wylie transliteration. (This will later be converted to Unicode Tibetan in the conversion process.)
  • If you identify a new text not currently in the Text Catalog Records exported from the FilemakerPro database:

  1. Note this on the Wiki page called:
    Tibetan Canons > Kangyur-Tengyur Cataloging Status Reports > Texts not in Database
  2. Send an email to Phil Stanley (philstanley@comcast.net) who will provide you with the Master Text Number, Peking Number, Eimer Number, Takasaki Number, and Text Number (i.e., new sequential text numbering for the collection) for this new text.
  • File Naming Conventions

  1. Text Catalog Records: files for text records should be named as follows: collection siglum-####-bib.doc

  1. The number of digits in the filename vary according to the number of texts in the collection. If a collection has 99 or fewer texts, use a two-digit number. If a collection has more than 99 texts but fewer than 1000, use a three-digit number. If a collection has more than 1000 texts but fewer than 10,000, use a four-digit number.
  2. Use leading zeroes. For instance, 001 for the first text of a collection that has more than 99 texts but fewer than 1000.
  3. Example: for the Nartang edition of the Kangyur, the siglum of which is N, the text record file for the first text should be named N-0001-bib.doc; the text record file for the 357th text should be named N-0357-bib.doc

  1. Volume Catalog Records: files for volume catalog records should be named as follows: collection siglum-section of the collection-v###-bib.doc

  1. note: section of the collection is either Kg or Tg and indicates whether the text is from the Kangyur or Tengyur.
  2. Example: for the Nartang, the siglum of which is N, the catalog record file for the first volume should be named N-Kg-v001-bib.doc

Provided for unrestricted use by the external link: Tibetan and Himalayan Library