Open-Captions, Using Closed Captions as Meta Data for ASL

This post originally appeared on the Yahoo! Accessibility blog.

What is Open-Captions?


According to research, over 90% of deaf children have hearing parents who “frequently do not have fully effective means of communicating with them”. The American Sign Language (ASL) is a difficult language to learn, especially as a second language.

Open-Captions makes it easy for parents and children to learn and practice American Sign Language together while watching their favorite videos on YouTube. People can find closed captioned videos on any topic with the Open-Captions search engine. The viewer is able to select individual words in the video’s caption stream and see the American Sign Language representation of the word.

Open Captions Example Screenshotopen-captions1

Closed captions on various television shows have been around for quite some time. They are useful for hearing impaired people and also to people whose first language is not English.
Captions have always been a steady stream of white text on black background, located either at the top or bottom of the screen. Video sites like YouTube and Hulu , have adopted the same style from television.

Open-Captions, changes the way users watch videos by making the captions more interactive.

  • The viewer can pause the video, select a word and watch an ASL representation.
  • The viewer can review earlier captions with the “Previous Caption” button.

The American Sign Language representations are shown from the SmartSign dictionary maintained by Georgia Institute of Technology’s Center for Accessible Technology in Sign (CATS).

How did Open-Captions come about?

My interest in building Assistive Technologies has been one of the main motivating forces behind getting a graduate degree in Computer Science.

I built a web based widget platform for Interactive TV as a graduate student researcher at GeorgiaTech. In the process I worked on developing functionality that would allow widget developers to access the closed captions associated with the currently playing content.

Harley Hamilton, a researcher at CATS, was interested in leveraging this ability of accessing closed captions and we ended up building a rough prototype. Subsequently, I learnt more about hearing impaired children from Harley and wrote a small tool using which one can select any word on a webpage and see the American Sign Language representation of the word. This would be very useful for parents who want to learn ASL.

In September this year, I signed up for the HackdayTV weekend hackathon and ended up building the first version of Open-Captions, which is the mashup of the earlier ideas I had worked on related to ASL.

What happens behind the scenes in Open-Captions?

Disclaimer: Technical content ahead ;)
The entire source code for the project can be found on GitHub

YouTube has extensive documentation on how to use their APIs for search, embedding and controlling videos on your site. The search and results page are based on the PHP library of the YouTube search API. The page on which the video plays is the one that gets interesting. There are 2 parts to it -

  • Getting the closed captions of the video and synchronizing it to show correctly
  • Handling the mouse clicks on the individual words to show the ASL representation

I used Firebug to see what requests YouTube sends whenever you click on the “Interactive Transcript” icon under the video. I removed parameters that were not mandatory (i.e. I would still get the same result if I removed them) and made a PHP curl request for the trimmed URL to get the captions formatted as XML. For this YouTube video the URL for the captions would be{VIDEO_ID}&type=track&kind=&hl=en

and the result will be of the form

[Sesame Street theme music]
Elmo: You okay, Chris?
Chris: I'm good, I'm good.
Elmo: You ready?
Chris: Yeah, I'm ready. Oh, hey!
Red light's on. Hey, hi, everybody.
I'm Chris.
Elmo: Oh, and Elmo's Elmo.

Chris: Mm-hmm, he sure is.
I work at Hooper's Store right here on Sesame Street.

As far as I understood, not all YouTube videos with captions could be accessed with the above API call because their caption files had names. Then, I came across this user script which is useful if you want to download the captions file. The author had a snippet of code for retrieving the name of the captions file –{VIDEO_ID}&type=list

this returned an XML with the name of the captions file. I used that to extend my API call to{VIDEO_ID}&type=track&kind=&hl=en&name={CAPTION_FILE_NAME}

Then using JavaScript setTimeout function and the values of the start and duration of the captions, I do the synchronization of the captions and the video.

function showAppropriateCaptions(){
var i,
len = global_full_captions.length,
//my_ytPlayer is the handle of the YouTube player
if(my_ytPlayer.getPlayerState() == -1){
if(my_ytPlayer.getPlayerState() == 1){
// State = 1 => playing video - so get the time and show appropriate captions
// this will get triggered automatically when the video starts for the very first time
currTime = my_ytPlayer.getCurrentTime();
// to-do: a better way to handle hide show instead of doing at every iteration
$('.myCaptionSpan').show();// the container having the captions
$('#previous').show(); // the previous button

global_full_captions[len-1].startTime + global_full_captions[len-1].duration){
// it has ended, no more captions to show
if((global_full_captions[len-1].startTime <= currTime && ((global_full_captions[len-1].startTime +
global_full_captions[i].duration) > currTime))){
// ugly workaround for showing the last caption
if((global_full_captions[i].startTime <= currTime && (global_full_captions[i+1].startTime > currTime))){
// found the appropriate caption
// now call the same function before the start of the next caption
Math.abs(global_full_captions[i+1].startTime - global_full_captions[i].startTime)*1000);

Now, for every word in that particular line of the caption, I broke it up into span elements and attached click handlers to each. The span elements look like this

<p class="mycaption">
<span id="beautifulCaptions0">to</span>
<span id="beautifulCaptions1">spend</span>
<span id="beautifulCaptions2">some</span>
<span id="beautifulCaptions3">good</span>
<span id="beautifulCaptions4">time</span>
<span id="beautifulCaptions5">with</span>
<span id="beautifulCaptions6">my</span>
<span id="beautifulCaptions7">good</span>
<span id="beautifulCaptions8">buddy</span>
<span id="beautifulCaptions9">Elmo</span>
<span id="beautifulCaptions10">over</span>
<span id="beautifulCaptions11">here.</span>

When a word is clicked on the captions, the showASL method gets called and the word string is passed as a parameter. After stripping the words of extraneous characters like !, &, ], [, ; etc and converting the word to lowercase, the method inserts an iframe on the right hand top corner whose URL points to{THE SELECTED WORD}.htm

The pages showing the ASL have flash videos embedded, so I have styled the iframe so that the flash video is at the center of the box. Some of the pages have just images and some may have both. Hence, I added the “Full Page View” button under the ASL box, so that viewers could see the entire page if they wanted to.

The SmartSign website covers around 25000 words and hence there will be words in the captions for which the ASL representation do not exist. For such cases, I get an image from Bing to substitute for the ASL.

The source code is undergoing constant refactoring, and would love to get more ideas on how to design a better solution.

What lies ahead for Open-Captions?

Currently, I am gathering feedback from hearing impaired users about Open-Captions and also getting thoughts on what additional features they would like. Harley, the researcher at CATS, is excited about this project too and reaching out to more people for feedback.

Another idea is to build Open-Captions as the “Khan Academy” of ASL, by having videos that enable different levels of learning of ASL.