The challenges of voice recognition in captioning

Error message

Deprecated function: Array and string offset access syntax with curly braces is deprecated in include_once() (line 14 of /home/mediacc/public_html/themes/engines/phptemplate/phptemplate.engine).
Thursday, 31 January 2013 14:13pm

Voice recognition software is becoming the staple of captioning providers but the technology is far from perfect, writes Television Project Manager and veteran captioner Chris Mikul.

Voice recognition software is increasingly being used to caption news programs, sporting events and other live TV programs in Australia and the rest of the world. It’s still a relatively new technology though, and there are many factors which can affect the quality of the captions produced using the technology.

As anyone who has used the automatic captions available on YouTube videos will know, captions generated by voice recognition can be woeful (although if you own the video you now have the ability to edit them). When TV programs are captioned using voice recognition, a captioner needs to repeat the dialogue as clearly and evenly as possible into a microphone as the program goes to air (so the method is sometimes called re-speaking captioning). Only about a third of people have the ability to do this.

In a recent blog post, British captioner Martin Cornwell discusses some of voice recognition’s shortcomings. “Since much of the tech is developed in the US, it’s understandable that our software is best able to interpret US English. It’s also pretty good for South-Eastern British English, since that’s also quite a ubiquitous accent. However it can struggle with others, especially distinctive regional ones like Glaswegian or Northern Irish,” wrote Cornwell.

The software also has trouble with foreign names, although if captioners know that names are likely to come up during a program, there are ways of dealing with this.

Voice recognition is often used to caption sporting events, but these present their own challenges, with commentators often speaking over the top of each other, or giving redundant information. In an interview with Media Access Australia, captioner Jennifer Wardle spoke about how she and her colleagues get around these difficulties.

As Claude Le Guyader from Deluxe Media noted during his presentation at the CSI User Experience Conference in London last December, the increased use of voice recognition around the world is due to a shortage of stenocaptioners. These are highly trained individuals who use a phonetic keyboard to create live captions. Their scarcity in Australia was one of the reasons why the Australian Caption Centre (Media Access Australia’s predecessor) began to experiment with voice recognition in 2005.

 As caption levels increase, both in Australia and overseas, it’s inevitable that TV viewers will see more captioning created using voice recognition. A couple of years ago, there was some very poor re-spoken captioning going to air, but things have improved since then. Captioners are getting better at using the technique, and the software itself is improving every year. It’s vital though that captioners are given adequate time to prepare for each program, so they can research names and terms that are likely to come up, and enter them into the software prior to broadcast.


Top of page