Udp

THDL Toolbox > Tibetan Scripts, Fonts & Related Issues > Conversion-Reversion For Tibetan Fonts > Tabular Survey Of Converters & Reverters For Tibetan > UDP

UDP: How to convert legacy Tibetan documents to Tibetan Unicode using UDP

Contributor(s): Chris Walker.

Introduction

UDP is one of the best general converters for Tibetan, but only works in Windows. With it, you should be able to convert many of the older Tibetan documents stored in your computer into the the newer Tibetan Unicode standard. If you have legacy Tibetan documents (containing TCRC Youtso, Sambhota, LTibetan, Tibetan Machine Web, etc) to convert into Tibetan Unicode, then you will first want to make sure that you are using the new Windows Vista operating system, or prepare your Windows 2000 or XP system to work with Tibetan Unicode - see Using Tibetan in Windows for details. You will also need to install those legacy fonts into your computer of course.

Update May 2020: A process for converting Sambhota to Unicode has been developed using UDP in Windows on Virtual Box on a Mac with some scripts to aid the process. While geared toward a THL context, the README and some of the scripts may be beneficial to those who want to convert legacy Sambhota fonts to Unicode. See external link: https://github.com/thl-texts/tibetan_text_scripts

Downloads

external link: Download UDP

In order to download the application called The Unicode Document Processor (UDP), you should navigate your browser to external link: http://udp.leighb.com (or otherwise do a Google search on the keywords “UDP Tibetan” to find a mirror site). On the website, you will find ample English descriptions of the program (Figure 1).

udp01_udpHomepage_resized.png
Figure 1: Website of the UDP Converter and Editor

Versions in other languages

Conversions Full Description

To be written.

Instructions

Setting Up


Take note of the links within the red box on the left side of the screen, as browsing these categories will be especially informative. Most importantly, you will want to follow the link to the download page so that you can install the program on your local machine. There will be several UDP download packages to choose from, and you will get the best one by clicking on “Download the complete UDP package.” (figure 2)

udp02_downloadUDP_resized.png
Figure 2: Various UDP Download Versions

As you begin the download, you will be asked by your browser if you are sure you want to download the program from the Internet. Go ahead and confirm to move ahead.

Vista may pop up a dialogue box asking you if you really want to install the program, at which you should select “Allow.” The setup program for UDP will then bring up its own dialogue box asking for your preferred install destination and whether you accept the license agreement. Choose “I Accept.” (figure 3)

udp03_setupUDP_resized.png
Figure 3: Default Install Location and License

The next few dialogue boxes will inform you of the installation process, and you can click quickly through them by indicating “yes” or “okay.” At the end of the installation, you will be given the option of running the UDP program straight away. Click “Yes” to start UDP. If nothing happens,look at the bottom of your screen at the taskbar for the words “ReadMe.udp”, as you may need to click that area in the task bar to bring UDP to the front.

Opening up UDP, the first thing you need to do is click on “Options” from the top menu bar and then choose “Font…” From there, you will see a button entitled Choose Unicode, which you should press in order to select an approriate Tibetan Unicode font (figure 4).

udp04_optionsFont_resized.png
Figure 4: UDP Fonts Dialog Box

The font dialogue box will allow you to choose a font name and size. Since you are using Vista, you will already have Microsoft Himalaya as a font option. You can use this Himalayan font, or pick another Tibetan Unicode font of your choice (figure 5).

udp05_optionsFont_resized.png
Figure 5: Choosing Your Tibetan Unicode Font

Once you have chosen a Unicode font, you will be returned to the UDP Fonts dialogue box. You should click on the radio box to the left of the words “Unicode,” (figure 6) which itself is followed by the name of the Tibetan Unicode font that you specified a moment ago (such as Microsoft Himalaya). You have now properly prepared UDP for processing conversions into Unicode and can press “OK.”

udp06_croppedChooseFont_resized.png
Figure 6: Clicking the Radio Box Next to "Unicode"

To preserve yig chung in the output RTF by surrounding with « and », configure as follows:

  • Options > Advanced Options
  • Select the following by clicking in the box to the left of each one:
    • Automatically repair Tsheg-followed-by-Vowel errors
    • Allow non-breaking Tibetan spaces in Unicode
    • Enclose small fonts in «» during RTF import"

udp11_yig-chung-config.png
Figure 7: Enclose Small Fonts in «» during RTF import

Converting Tibetan Text


You should now open up your old Tibetan document in Microsoft Word (figure 7).

udp07_heresWord_resized.png
Figure 7: Original Tibetan Document in Word

Be mindful that if you don’t have the legacy Tibetan fonts on your machine, you won’t be able to display the old Tibetan content, let alone convert it (the original Tibetan will probably appear as random English or Chinese characters!). Once you have your original Tibetan document open in Word, and can clearly see the Tibetan with the original fonts, your next step will be to open Wordpad. Wordpad is found on all Windows computers, typically under Start Programs -> Accessories -> Wordpad.

You’re now set up to start the conversion. First, select and copy the Tibetan text from Word then paste into Wordpad (figure 8).

udp08_intoWordpad_resized.png
Figure 8: Clipboard Copy from Word into Wordpad

If the Tibetan text now in Wordpad appears to have some unwanted spaces, do not fret, as those spaces will not be carried into UDP. From Wordpad, again select the Tibetan text, copy it and paste into UDP. When the text is pasted into UDP, it is automatically converted into Tibetan Unicode, and the text in UDP will show up using the Tibetan Unicode font that you specified earlier (figure 9).

udp09_finalConvertInUDP_resized.png
Figure 9: Tibetan content automatically converted to Unicode when pasted

Now that the conversion to Unicode is complete. All that is left is to copy and paste from UDP back into Word, and voilà! Hopefully, These steps have shown you how quick it can to be to convert old Tibetan documents into new Unicode standard (figure 10).

udp10_finalConvertInWord_resized.png
Figure 10: The Final Unicode Text Pasted Back into Word

Performance Testing

To be written.

History of Development and Releases

To be written.

Provided for unrestricted use by the external link: Tibetan and Himalayan Digital Library