Chinese Transliteration & Transcription

The Transliteration and Transcription of Chinese

Contributors: David Germano, Ellen McGill


Chinese is traditionally written in complex characters. As such, it is not possible to transliterate the orthography in other languages, and instead transcription of the sound is the common practice. The only transliteration done is the contemporary conversion in the PRC of complex Chinese characters to a simplified form of characters, which could be said to have a relationship of “transliteration” to the traditional complex characters still used in Taiwan.

Pinyin is the international standard for the phonetic transcription of Chinese using roman script. This replaced the earlier widespread Wade-Giles system of transcription – thus “Peking” became “Beijing,” and so forth. The basic details are documented on external link: Wikipedia’s Pinyin site. The US Library of Congress also has a good site, including comparison to Wade-Giles, for its external link: New Chinese Romanization Guidelines. Finally there is an interesting site called external link: which has a variety of references and other information about the use of Pinyin.

Formatting of Pinyin

Unfortunately, the formatting of pinyin – and specifically the question of where to break words – is not standardized. The ALA-LC (American Library Association - Library of Congress) standards are commonly used in US libraries, and the relevant conventions for library cataloging are described at external link: These standards run roughly eighteen pages in length, about half of which are devoted to rules and examples of where to put spaces and where not in the combination/separation of syllables and words. Unfortunately, ALA and LC have changed the standards several times. Thus if you look in the library catalogs, you will see that sometimes the same Chinese word is handled differently, since the rules were different at the time of cataloging. The famous sacred mountain Wutaishan, for example, may appear as Wutai shan, Wutai Shan, or Wutaishan. The rough rule is that proper nouns are written without breaks, and everything else with breaks. The difficulty comes in deciding what counts as part of the proper noun, when is it acting as an adjective, etc. Is shan part of the name or is Wutai the name and Shan (“mountain”) the object described by Wutai? Is it Xizang zizhiqu or zi zhi qu? Some catalogers as a consequence have been trying to follow what they understand to be the standards, and then often adding in another line (a 246 field which is for alternate title) where they give the other version. In this way, they hope that people can find materials even if they are not part of the tiny percentage of the population that knows ALA-LC rules.

Thus if the concern is to make materials findable on the web, following the ALA-LC cataloging standards is not the wisest course of action, at least not exclusively. We thus try to put the characters and then as many forms of romanization as we have time for; for pinyin, we try to adhere more to a “commonsense” approach more than the cataloging standard. People with experience helping users (from various linguistic backgrounds) with library catalogs have found that they usually search for zizhiqu, minzu, and gewutuan rather than zi zhi qu, min zu, and ge wu tuan respetively. If you put min zu on your page or in your metadata, and someone searches for minzu, s/he probably won’t find it; but if you have minzu, a search with min will find it.

Issues Regarding the Use of Pinyin in Relationship to Tibetan Words

In contemporary times, most roman script versions of Tibetan words – usually place names and personal names – that come from China are actually the pinyin transliteration of the simplilfied Chinese characters, which themselves are a phonetic rendering of the original Tibetan. Thus “Lasa” is not a direct transliteration of the famous Tibetan capital usually referred to in English as Lhasa, but rather is the pinyin of the Chinese characters for the Tibetan word. The same is true when we see “Suolang Duojie” for the common Tibetan name “Sönam Dorjé” – the former spelling is the pinyin of the standard Chinese character phonetic rendering of the Tibetan. There is, however, a widespread separate system in China for going straight from Tibetan words into latin script. Unfortunately, it is unsystematic and irregularly in its implementation, as described in Ethnic Pinyin Of Tibetan.

The pinyin will often seem to be oddly spelled as a transcription for the corresponding Tibetan term. This is often assumed to be because Chinese speakers responsible for transcribing Tibetan have a poor comprehension of the sounds of Tibetan. While this is no doubt at times the culprit, there are broader factors at work. Firstly, sounds are transcribed by roman script in accordance with correlations between sounds and roman script found in the pinyin system. Thus if you are not familiar with pinyin, you will find the use of certain roman letters to be puzzling. Secondly, the sounds being transcribed are often as found in a specific dialect of Tibetan, especially for such things as place names. Thus someone familiar with how that word or place name would be pronounced in central/standard Tibetan will find the transcription puzzling, since pronunciation varies dramatically from dialect to dialect in Tibetan. Thus in analyzing a given transcription, it is useful to consider if the term in question has a specific regional association that would point to the transcription being a rendering of a specific dialectical pronunciation. Thirdly, the Chinese language itself consists of many different dialects which vary greatly in pronunciation. The rendering of a Tibetan term in ethnic pinyin can thus be influenced both in how it is heard, and in how its sounds are associated with roman script letters, by the Chinese dialect with which the transcriber is familiar.

When Chinese write Tibetan place names or people’s names into Chinese, they usually phonetically render the name based upon sound into similarly sounding Chinese characters, rather than semantically translating it into Chinese characters on the basis of meaning. In these cases, the Chinese characters do have meaning, but that meaning is incidental to their use in rendering the Tibetan name, since they were chosen primarily on the basis of their sounds. However, Chinese semantic equivalents are typically used if a part of a person’s name is an abstract title (“president,” etc.), or part of a place name is an abstract term such as “monastery,” “city” or “county.” In those cases, the Tibetan is not phonetically rendered, but rather is precisely translated with the equivalent Chinese term. This yields “hybrid” renderings where a Tibetan name is rendered in Chinese by a combination of phonetics and translation.


There are a number of online sites that convert from simplified characters to traditional characters and vice versa. There are also sites that convert from Chinese characters to pinyin. One of these is external link: Some sites convert to pinyin with tone marks; some convert to pinyin with tone numbers; some convert to both. Thus far we have not found any sites that convert to pinyin without either tone marks or numbers.