Transforming An Old Xml Essay

THL Toolbox > Web Development > Transforming an Old XML Essay

Transforming an Old XML Page

Contributor(s): Than Grove, Bill McGrath & Steve Weinberger

Check out the entire directory /texts/cocoon/essays/

All link URLs contained in any essay, including JIATS essays, are all located in one of two global files, both of which are found in the /texts/cocoon/essays folder:

  1. external-links.dtd: contains all links to sites outside of THL.
  2. internal-links.dtd: contains all links to internal THL pages.

You must use Oxygen XML editor. UVa students, faculty, and staff can download it from this page: https://www.web.virginia.edu/rescomp/SoftwareInfo.asp?ID=44. Note: the installation instructions say do NOT install in the same folder as a previous version of Oxygen. They also say to put the file you download in a folder other than the folder where Oxygen will live on your hard drive (so download to desktop or somewhere like that).

Converting Sera Hermitages XML essays into new system for images:

You only have to do the following eleven steps the first time you run this transformation.

  1. Open xml file to be transformed
  2. Open essays/xsl/internal use/convertImgsToFigures.xsl
  3. Click Configure Transformation Scenario tool in Oxygen toolbar
  4. create a new scenario
  5. name it (convert-essay)
  6. XML URL: leave the default value
  7. XSL URL: select convertImgsToFigures.xsl
  8. Transformer: Saxon8B
  9. Under Output tab, click "Save as" and "Open in editor" and Show as: xml
  10. Click OK
  11. Click "Transform now" button

The following procedure needs to be performed on each XML file that you are transforming

  1. "Save file as" filename in the appropriate folder
  2. At the top of the XML file, there is a comment <!--Links within Document-->:. Any links listed here that are not breadcrumbs still need to be updated
  3. delete <xml-stylesheet href="">
  4. Copy and paste DTD statement and any !ENTITY etc from the old xml file into the new file, right before <TEI.2>
  5. change location of the dtd to this: external link: http://www.thdl.org/global/xml/dtds/xtib3.dtd
  6. Check to make sure it is xtib3.dtd and not xtib2.dtd
  7. validate by clicking on checkmark icon that reads “Reset cache and validate”
  8. If it doesn’t validate and the error is with <sourceDesc>, then on <sourceDesc> element, delete attribute default="NO"
  9. Copy and paste :
    <!ENTITY % extlinks SYSTEM "../external-links.dtd">
    %extlinks;
    
    <!ENTITY % intlinks SYSTEM "../internal-links.dtd">
    %intlinks;
    
    <!ENTITY glossary SYSTEM "glossaries/cabezon-sera-herm-gloss.xml">
    Note:The entity files are located in the /texts/cocoon/essays/xml file. So for JIATS, their location would be: "../../../essays/xml/external-links.dtd" Also, the location of the glossaries differs. In JIATS all glossaries are in the folder /texts/cocoon/jiats/xml/glossaries but in essays they are located within the letter sub-folder in a folder called glossaries (i.e., /texts/essays/xml/{letter}/glossaries/). Thus, if you add letter folders to /xml/, such as /xml/q/, then you also have to create a /glossaries subfolder for each folder that you added. In this case, /xml/q/glossaries/
  10. Delete from new file any declaration that begins with !NOTATION, such as <!NOTATION HTML SYSTEM "html">
  11. Move any entity declarations found in the top of the file into either the external-links.dtd file (for links outside of THDL) or the internal-links.dtd file (for THDL links). The names of THDL links should be reconfigured to represent its location within THDL, e.g. places-mons-sera.
  12. Paste what you copied <!ENTITY …>
  13. Move the tag <name key="###" reg="thdl-participant"> into the <author> tag within the <titleStmt>. Change the tag-name to <persName key="###" reg="thdl-participant"> and within that enclose the author’s last name in <surname> tags.
  14. Move <date> that was inside the <name> tag to outside the <persName> tag, but still within the <author>
  15. within <publicationStmt> , delete all the empty <respStmt>s and the old <author>
  16. <respStmt>s that DO have content: move up to just after the <author> tag but within the <titleStmt>
  17. <name key="per####" reg="thdl-participant">
  18. <publisher>: delete and replace with: &thdlpublisher;
  19. <pubPlace>: delete and replace with &thdlpubplace;
  20. <availability>: add this after <date> and before </publicationStmt>
  21. paste into the <availability> tag the following:
    <p>
    	<bibl n="thdlloc">
    		<xref doc="thdl-places" type="url">Places</xref>
    		<xref doc="thdl-places-mons" type="url">Monasteries</xref>
    		<xref doc="thdl-places-mons-sera" type="url">Sera Monastery</xref>
    		<xref doc="thdl-places-mons-sera-herm" rend="home-link" type="url">Hermitages</xref>
    	</bibl>
    </p>

    The attribute on the last <xref … rend="home-link" … > is what creates above the TOC a "home" link with the little house icon and the text within the xref. In this case, it creates the link "Hermitages" (with the home icon) above the TOC. If that rend attribute is absent, then no home link will appear above the TOC.
  22. <sourceDesc>: delete all content within it. Create the following
    <sourceDesc>
    	<p>Essay written for digital publication on THDL.</p>
    </sourceDesc>
  23. <profileDesc>: delete entire. Replace with &thdlprofiledesc;
  24. Add <back></back> after </body>
  25. For Sera Hermitage essays,
    <back>
    	&glossary;
    </back>
  26. In <body>, delete <head>body</head>
  27. In <body> of essay, delete container <div1>. Change all top-level sections, currently marked as <div2>, to <div1>; change all <div3>s to <div2>s; etc. Note: THIS ONLY APPLIES TO THE BODY OF THE ESSAY.
  28. Add ID attribute to EVERY <div> within the whole document. For <front>, first div is id="a1"; for <body>, first <div1> has id="b1"; first <div2> of first <div1> is <div2 id="b11>; second <div2> of first <div1> is <div2 id="b12">; <div3> of second <div2> of first <div1> is <div3 id="121"> etc.
  29. Images: Each image needs to be located in the Media Management System (MMS). If not there, it needs to be uploaded to. The old mark-up for images was, e.g., <xref n="external link: http://www.imagelocation.com" type="image"> All such references need to be fixed. Correct markup:
    <figure entity="thdl-mms-images" n="{MMS ID# for image here}">
       <figDesc>Caption text, with whatever markup is necessary, goes here.</figDesc>
    </figure>
    //A URL that locates the original image should be added in a comment,e.g., (<-- comment text here -->), in cases where the image has not yet been uploaded to the MMS.
  30. Look at images after it is posted online. Make sure images are positioned in a way that is smooth and not jarring.
  31. Images by default are floated right—displayed to the right of that paragraph of text.
    • rend="left" moves image to the left of the paragraph
    • rend="center" text does NOT wrap around the image
  32. If you need to put two images side by side: <p rend="imgs">: displays images side by side; for the left-hand image, add <figure rend="left">
  33. For Sera Hermitages essays, change the URL for images to: "http://ESSAYS-HOME/c/cabezon/{filename}". For JIATS an example is: "http://JIATS-IMG/03/jiats03elverskog_img1-sm.jpg"
  34. Fix and check all links in document. The XSLT transformation done above will leave a list of all links in the document in a comment at the top. The old mark-up for a link was: <xref n="http://www.tbrc.org" type="url">external link: http://www.tbrc.org</xref>. The correct mark up for a link is: <xref doc="tbrc" type="url">external link: http://www.tbrc.org</xref> where "tbrc" is the name of an entity (in this case) declared in the external-links.dtd file. Old link mark-up has to be changed to the new mark-up and the entities have to be created for a link if it doesn't already exist. Use entities to declare base-links that can be used with different variables and put the variable value in the n attribute of the <xref> tag. Thus, for the two TBRC links, external link: http://www.tbrc.org/kb/tbrc-detail.xq?RID=P1583 and external link: http://www.tbrc.org/kb/tbrc-detail.xq?RID=P1709. There is a single entity you would use "tbrc-search" which is defined as
    <!ENTITY tbrc-search SYSTEM "http://tbrc.org/kb/tbrc-detail.xq?RID=" NDATA HTML >
    . Then, the individual <xref>s would be: <xref doc="tbrc-search" n="P1583" type="url">Dri med 'od zer</xref> and <xref doc="tbrc-search" n="P1709" type="url">Chos dbyings stobs ldan rdo rje</xref>. Before adding a link entity, however, check to make sure the link is still valid. If not, ask the author for an updated link or if the link should be removed. So, for each link, you need to:
    1. Check URL to make sure it is active and goes to the correct page.
    2. Move each entity to the external-links.dtd file, fix the name if necessary, and then fix the doc="" value in the essay itself.
  35. URL for a Sera Hermitage Essays: directory path/#essay=/cabezon/sera/herm/drakri/

Provided for unrestricted use by the external link: Tibetan and Himalayan Library