Contributor(s): Chris Walker.
UDP is one of the best general converters for Tibetan, but only works in Windows. With it, you should be able to convert many of the older Tibetan documents stored in your computer into the the newer Tibetan Unicode standard. If you have legacy Tibetan documents (containing TCRC Youtso, Sambhota, LTibetan, Tibetan Machine Web, etc) to convert into Tibetan Unicode, then you will first want to make sure that you are using the new Windows Vista operating system, or prepare your Windows 2000 or XP system to work with Tibetan Unicode - see Using Tibetan in Windows for details. You will also need to install those legacy fonts into your computer of course.
Update May 2020: A process for converting Sambhota to Unicode has been developed using UDP in Windows on Virtual Box on a Mac with some scripts to aid the process. While geared toward a THL context, the README and some of the scripts may be beneficial to those who want to convert legacy Sambhota fonts to Unicode. See https://github.com/thl-texts/tibetan_text_scripts
In order to download the application called The Unicode Document Processor (UDP), you should navigate your browser to http://udp.leighb.com (or otherwise do a Google search on the keywords “UDP Tibetan” to find a mirror site). On the website, you will find ample English descriptions of the program (Figure 1).
Figure 1: Website of the UDP Converter and Editor
To be written.
Take note of the links within the red box on the left side of the screen, as browsing these categories will be especially informative. Most importantly, you will want to follow the link to the download page so that you can install the program on your local machine. There will be several UDP download packages to choose from, and you will get the best one by clicking on “Download the complete UDP package.” (figure 2)
Figure 2: Various UDP Download Versions
As you begin the download, you will be asked by your browser if you are sure you want to download the program from the Internet. Go ahead and confirm to move ahead.
Vista may pop up a dialogue box asking you if you really want to install the program, at which you should select “Allow.” The setup program for UDP will then bring up its own dialogue box asking for your preferred install destination and whether you accept the license agreement. Choose “I Accept.” (figure 3)
Figure 3: Default Install Location and License
The next few dialogue boxes will inform you of the installation process, and you can click quickly through them by indicating “yes” or “okay.” At the end of the installation, you will be given the option of running the UDP program straight away. Click “Yes” to start UDP. If nothing happens,look at the bottom of your screen at the taskbar for the words “ReadMe.udp”, as you may need to click that area in the task bar to bring UDP to the front.
Opening up UDP, the first thing you need to do is click on “Options” from the top menu bar and then choose “Font…” From there, you will see a button entitled Choose Unicode, which you should press in order to select an approriate Tibetan Unicode font (figure 4).
Figure 4: UDP Fonts Dialog Box
The font dialogue box will allow you to choose a font name and size. Since you are using Vista, you will already have Microsoft Himalaya as a font option. You can use this Himalayan font, or pick another Tibetan Unicode font of your choice (figure 5).
Figure 5: Choosing Your Tibetan Unicode Font
Once you have chosen a Unicode font, you will be returned to the UDP Fonts dialogue box. You should click on the radio box to the left of the words “Unicode,” (figure 6) which itself is followed by the name of the Tibetan Unicode font that you specified a moment ago (such as Microsoft Himalaya). You have now properly prepared UDP for processing conversions into Unicode and can press “OK.”
Figure 6: Clicking the Radio Box Next to "Unicode"
To preserve yig chung in the output RTF by surrounding with « and », configure as follows:
Figure 7: Enclose Small Fonts in «» during RTF import
You should now open up your old Tibetan document in Microsoft Word (figure 7).
Figure 7: Original Tibetan Document in Word
Be mindful that if you don’t have the legacy Tibetan fonts on your machine, you won’t be able to display the old Tibetan content, let alone convert it (the original Tibetan will probably appear as random English or Chinese characters!). Once you have your original Tibetan document open in Word, and can clearly see the Tibetan with the original fonts, your next step will be to open Wordpad. Wordpad is found on all Windows computers, typically under Start Programs -> Accessories -> Wordpad.
You’re now set up to start the conversion. First, select and copy the Tibetan text from Word then paste into Wordpad (figure 8).
Figure 8: Clipboard Copy from Word into Wordpad
If the Tibetan text now in Wordpad appears to have some unwanted spaces, do not fret, as those spaces will not be carried into UDP. From Wordpad, again select the Tibetan text, copy it and paste into UDP. When the text is pasted into UDP, it is automatically converted into Tibetan Unicode, and the text in UDP will show up using the Tibetan Unicode font that you specified earlier (figure 9).
Figure 9: Tibetan content automatically converted to Unicode when pasted
Now that the conversion to Unicode is complete. All that is left is to copy and paste from UDP back into Word, and voilà! Hopefully, These steps have shown you how quick it can to be to convert old Tibetan documents into new Unicode standard (figure 10).
Figure 10: The Final Unicode Text Pasted Back into Word
To be written.
To be written.