Making The Audio-video Recordings

THL Toolbox > Instructional Resources > Creating Language Instructional Materials > Making The Audio-video Recordings

Making The Audio-video Recordings for Language Instructional Resources

Contributor(s): David Germano, Eric Woelfel.

1. Making the audio-video recordings

The creation of audio-video recordings for language instructional purposes involves three major issues: technical quality of the sound and audio-video, type and authenticity of speech, and diversity in speech.


The quality of the recording involves distinct issues pertaining to audio and video quality respectively.

Audio quality issues chiefly relate to having excellent quality microphones, having the right type of microphones for the different situations, and then knowing how to use those microphones to get the best quality audio. One issue with microphones is whether to use battery powered wireless microphones. These provide great mobility for the participants, and remove unsightly microphones and cables from the video, but can cause problems as batteries die during recording, especially in the field where electricity problems are common. Another issue is whether to have a microphone for each participant, or to use a microphone common to all participants. In terms of using microphones, the main thing is to get optimal placement so that all verbal participants are recorded at the same level and with clarity.

The most common issue with video quality is the lighting, and specifically shooting scenes where exposure problems cause dark images and/or dark faces. Another typical problem is when the videographer zooms in and out, and/or pans the video back and forth. The best thing an inexperienced videographer can do is NOTHING. Don't zoom in and out, and don't move the video camera back and forth. Choose a good zoom level and perspective, and just leave it steady. This may not be perfect, but as long as you choose the right perspective and leave the zoom wide enough that no one's movements will take them out of the picture, it won't be bad. Once you see a video or two with abrupt zooms in and out and pans, you will understand why sometimes less is beter than more.

Constant review of one's work after the shooting should quickly create a basic understanding of what to avoid regarding audio and video quality. Don't accumulate recordings in the field without daily review.


There are three types of speech, broadly speaking, that one can record: natural speech, improvised or "realistic" speech, and scripted speech. Natural speech is speech that is essentially "overhead", such that the speech is genuinely how people talk without any awareness of being recorded. While this has obvious virtues, it is quite difficult to actually capture. One either has to use hidden recording devices, which introduce ethical problems, or simply leave on recording devices for hours so that gradualy people forget they are on. In the latter case, one then has to strain through hours of work to find useful segments. In addition, one has little control - obviously - over the diversity of the speech.

Scripted speech means that one prepares a script, has participants memorize the speech, and then has them perform from the memorized speech. This has multiple problems, the most important being the tendency for such constructed scripts to be artificial speech that is not true to how people actually speak. In addition, participants will often have problems memorizing the script, thus creating delays and frustrations.

We thus generally advocate a focus on improvised speech. This means that participants are given a basic scenario, and then shortly afterwards asked to act it out using improvised speech. Certainly its true that there is a certain constructed element to the social setting, and the speech is affected by the awareness of the camera and artificiality of the situation. However we think this strikes an excellent balance between "natural" and "scripted" to yield an optimal blend of efficiency and influence on the one hand, and naturalness and authenticity on the other hand.


The goal in generating recordings is diversity and authenticy- diversity in types of language, types of social contexts and types of participants, and authenticy in reflecting actual, natural social settings and uses of languages. In choosing participants, we look for diversity across age, gender, social class, and linguistic background (i.e. even though all speaking mutually comprehensible forms of the language, there can be considerable variations). In addition, in combining participants, we look for different pairings that can bring out different types of speech - two young men, an older person and a younger person, a young man and young woman, an aristocrat and a poor person, and so forth.

An additional issue is to consider using professional actors. Professional actors can be extraordinarily pleasurable to work with because of their ability to "get" a scene and improvise at a moment's notice. However, one should be cautious since it could be that the language is somewhat more artificial precisely because of their professionalism, though in our experience we find with ordinary people embarassment and awkwardness has an even greater impact on the quality and naturalness of speech.

In terms of social contexts, we advocate planning your shoots based on diverse social settings which form a broad range of communicative contexts in the culture in question. Quarelling, flirting, business exchanges, teachers and students, parents scolding children, job interviews, old friends reminiscing, people talking about medical situaitons, discussing dreams, and so forth. In setting up the scenarios, one should also pay attention to the different types of language from a grammatical and lexical point of view - for example, the use of imperatives between friends, as opposed to in quarrels, between social equals and so forth.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library