Text Markup - Metadata

THL Toolbox > Developers' Zone > THL Database Technologies and Models > Our Guide to TEI XML Markup > Text Markup - Overview

Text Markup - Overview

Contributor(s): Nathaniel Garson & David Germano

Metadata

Every XML document needs to have some metadata, describing the source of the data that is the digital text. The Tibetan Text template provided with the XMLmind customization already contains a metadata section with most of the necessary elements in it. Comments in the XML text describe the information that belongs in each field.

Minimum Required Metadata Info

The minimum requirements for metadata are the publishing information concerning the electronic text and the information concerning the original source document. Electronic publishing information needed is the following information:

  1. Text Title in English and Foreign Language if a translation
  2. Person responsible for transcription
  3. Person responsible for editing
  4. Person responsible for conversion & lineation
  5. Person responsible for mark-up
  6. Information concerning location in THL: links and names for domain, portal, project, and home.
  7. Brief summary/description of the document.
  8. Information for Breadcrumbs.

external link: footnote1

Other persons involved in the creation of the electronic document can be added as needed. Any such entry requires three pieces of information: responsibility (what was done), name (who did it), and date (when it was done). The date should be in the form of yyyy-mm-dd.

Required information concerning the original source document that was entered into the computer is:

  1. Title
  2. Author
  3. Editor
  4. Translator
  5. Pagination
  6. Place of Publication
  7. Publisher
  8. Date of Publication

Markup for the Metadata

The markup for the metadata section is created using the Word to XML converter macro. All the information to be marked up goes in the table at the beginning of the Word document in the clearly defined fields to the right of each label. The detailed description of the metadata markup has yet to be written. To learn more about it, one is advised to create an XML document using the converter and inspect the result in Morphon or some other XML or text editor. A few helpful hints are included below.

Adding Other Persons to the Provenance

To add further people to the list of provenance personnel, highlight the last responsibility statement (respStmt) before the publication statement. Choose the edit tab at the bottom right and click Insert After. Choose "respStmt" and add in the information. The responsibility statement will come up with only a name element. Insert before this a responsibility element (resp) and after the text of the name insert a date element (date). All dates in the metadata section should be in the format of yyyy-mm-dd.

Breadcrumbs

The breadcrumbs are the links that appear in the bar underneath the Javascript menus on a standard THL page. They begin with "THL >", followed by Domain, and describe the hierarchical embedding of the present page, with each level being separated by a closing angle bracket (>). Each is linked to its respective page. In a THL XML document the information for creating these links is stored in the metadata header within a <bibl> element as described below. For each breadcrumb, one needs the name of the page (that is, the text that will appear as the breadcrumb) and the URL that links to that page. The breadcrumb information is also used to create links to parent pages in the TOC. At the bottom of the TOC highlight box appear links "Back to …" and so forth. These are created based on the breadcrumb information.

Note: if there is only a single breadcrumb link in a document, it MUST go in the "Domain" link.

When creating an XML document using the Word to XML converter, the markup for the breadcrumbs is created automatically, using the information from the metadata table at the top of the Word document. However, for those instances where they need to be marked up manually, the description of the breadcrumb-markup is included below.

The breadcrumbs are contained in the <sourceDesc> (source description) element, which is the last child of the <fileDesc> (file description) element in the <teiHeader> (metadata element). As the last child of the source description element, there is a <bibl> (bibliography) element with its n attribute set to "links". Within that <bibl> element are a series of <xref>

(external reference) tags that contain the breadcrumb links. These <xref> elements should have their type attribute set to "url", the targType (target type) attribute should be set to one of preset values depending on the type of breadcrumb it is (see below), their n attribute set to the full url for that breadcrumb link. The text inside the <xref> should be the text that appears as the breadcrumb. An example of the <xref> for the breadcrumb "Tools" would look like the following: external link: footnote2

<xref from="ROOT"

n="/tools/"
targType="domain" type="url">Tools</xref>

The styles for THL XML essays automatically insert the first breadcrumb for THL and the last, or the name of the document, which is taken from the <title> element at the top of the metadata and is not linked. Furthermore, they are designed to allow for the four other types of breadcrumbs. The <xref> for each of these types of breadcrumbs should have its targType attribute set to one of these values using all lowercase:

  • Domain: This is the major domain within THL, such as Tools, Reference, Collections, Community, Education. (targType=domain)
  • Portal: The use of the terms portal and project here is not precise. In this instance, "Portal" refers to the next level down from the Domain, indicating a grouping page that groups several "projects". (targType=portal)
  • Project: This is one discrete unit within a portal. Again, the terminology is not precise. But, a project comes after the portal in this case. (targType=project)
  • Home: This is the page from which the document one is working on is linked. In the THL hierarchy, it is the immediate parent of the page. (targType=home)
  • Self: This is the breadcrumb version of the document's own name. The document's title from the <title> tag at the beginning of the metadata will display in the browser's title bar and as the first heading in the document. It is also generally used for the last (unlinked) breadcrumb, naming the present document. However, sometimes these titles are too long for breadcrumbs. In such a case, one may add a final <xref> with targType=self and no n attribute. The text it contains will be used for the final breadcrumb, representing the present document. (targType=self).

Note: if there is only a single breadcrumb link in a document, it MUST go in the "Domain" link.

Only the Domain and Home <xref> elements are required for the breadcrumbs to display properly. Either or both the Portal and Project breadcrumbs can be left out. No additional <xref> tags representing other levels will be processed at this time (Feb. 2004). An example of one document's breadcrumb section is:<blockquote><pre><bibl default = "NO" n = "links" > <br><xref from="ROOT" n="/tools/" targType="domain" type="url"><br>Tools<br></xref> <br><xref from="ROOT" n="/tools/scholartools/scholartools.html" targType="project" type="url"><br>Scholar's Toolbox<br></xref><br> <xref from="ROOT" n="/tools/scholartools/textmarkup.html" targType="home" type="url"><br>Creating XML Documents<br></xref> <br></bibl></pre> </blockquote>

Note:

The above example leaves out some attributes that are automatically added by Morphon and the DTD. If one were to look at an example in an actual THL document, the <xref> tag would be significantly longer. It is recommended that one do the breadcrumb markup in Morphon connected with the DTD so that these extra attributes will be added automatically.

Footnotes

1. "Breadcrumbs" are the links in the bar below the menus on a THL page that show where the page is located in the hierarchy of the website. they are separated by > and are ordered from the root THL to the page.

2. Line-breaks have been inserted to maintain this HTML page width; they are not necessary in the XML.

This page is provided courtesy of the external link: Tibetan and Himalayan Library.