The Xml Markup Behind The Thdl Toolbox

THDL Toolbox > Developers' Zone > Web-Development > THDL Toolbox Markup > XML Markup in THDL Toolbox

The XML Markup Behind the THDL Toolbox

Contributor(s): Than Grove

The XML file that maps out the structure of the Toolbox wiki is not in the wiki itself but resides on the THDL website at external link: www.thdl.org/tools/toolbox/toolbox.xml. This page will briefly describe its structure. View the file itself for specific details. The file is a GDMS file using the UVa Digital Library's DTD for GDMS.

The File Header: Doctype and Stylesheet Instructions

The XML File begins with

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/xml/styles/toolbox.xsl" type="text/xsl"?>
<!DOCTYPE gdms SYSTEM "http://text.lib.virginia.edu/dtd/gdms/gdms.dtd">

The first line is a standard XML declaration. The second line is used by the index.php script to determine which stylesheet to call for the transformation. The stylesheet is thus located at external link: http://www.thdl.org/xml/styles/toolbox.xsl. (This processing instructions also works to tell browsers like IE7 and Firefox where the stylesheet is located. These browsers will automatically transform the XML according to the stylesheet, but for some reason Firefox gets hung up.) The third line above is the Doctype Declaration which points to the DTD at external link: http://text.lib.virginia.edu/dtd/gdms/gdms.dtd.

A GDMS File as defined by the DTD is composed of two parts a GDMS header containing metadata and a div containing content. Divs can recursively contain children divs, and it is through this nesting of divs that the structure of a project is mapped out.

The GDMS Header

After the root element <gdms>, whose id is "thdl-toolbox", there is the <gdmshead> element containing the metadata about the file. The first part of this is a <gdmsid> in which is a URN defining the file. This is solely for validation's sake and does not figure in the display of the file. Then, there is a file description (<filedesc>) that contains a publication statement (<pubstmt>) which starts with the title for the document and the resulting HTML page. Next within the publication statement is a series of responsibility statements describing the origin of the document. These list the creators of the document.

The most important part of the header is the <biblscope> within the <pubstmt> as this defines the bread crumbs displayed for the document. These are marked up as follows:

<bibscope>
   <indexterm type="breadcrumbs">
      <extptr href="/tools/index.html" inline="true" targettype="domain" title="Tools"/>
   </indexterm>
   <indexterm type="breadcrumbs">
      <extptr href="/tools/scholartools/index.php" inline="true" targettype="self"/>
   </indexterm>
</bibscope>

Further <indexterm>s with <extptr>s can be added for portal, project, and home, changing only the href, targettype, and title attributes on the <extptr>.

The Body

The body of the XML file begins with a <div>. In GDMS, all <div>s must have unique IDs (unique within the document). In the Toolbox XML file I have used IDs prefaced with THDL and followed by a version of the date in milliseconds since 1970. For example, id="thdl1173719332656". However, one needs a script or macro to figure out such a number. If such is not available, then THDL followed by the date and a unique number will suffice. Thus, the first <div> in the document has an ID of "thdl20070105001".

A provisional ID Generator for XML divs is available at: external link: www.thdl.org/xml/idgenerator.html. This will provide unique IDs within a single document and in general for THDL since it creates an ID of "THDL" plus the date in milliseconds, but it does not prevent two IDs from being generated at the same millisecond, though this is highly unlikely.

Besides an ID attribute, each <div> must also have a label and type attribute. The label is used to display as the label in the hierarchy tree, and the type is set at the level of the div, e.g., "level1", "level2", or "level3" in this document.

An example <div> with a single child is:

<div id="thdl1173721775088" label="Communication" type="level1">
   <divinc type="wiki" href="/wiki/site/c06fa8cf-c49c-4ebc-007f-482de5382105/communication.html"/>
   <div id="thdl1173721822092" label="Mailing Lists" type="level2">
      <divinc type="wiki" href="/wiki/site/c06fa8cf-c49c-4ebc-007f-482de5382105/mailing%20lists.html"/>
   </div>
</div>

The element <divinc> stands for "div include", meaning that it points to something that should be included as a div, in this case a wiki page. Their type is always "wiwi". The href attribute points to the public wiki page. That is the page you see if you look at the info link for the wiki page and choose the "public view" link. The beginning of that url should be "https://collab.itc.virginia.edu/access". This part is deleted and the rest is used fro the href attribute on the <divinc>. To show that something is a child of something else, place the child's <div> within the parent's <div>, as in the above example.

Provided for unrestricted use by the external link: Tibetan and Himalayan Digital Library