Babel of Video
Some 200,000 videos have been subtitled in 100 languages, thanks to software that makes translating subtitles easy and relies on multilingual crowds.
On March 11, 2011, the most powerful recorded earthquake ever to hit Japan unleashed a tsunami that killed nearly 16,000 people and caused catastrophic meltdowns at the now-infamous Fukushima Daiichi Nuclear Power Plant. In the weeks that followed, Nicholas Reville, executive director of Amara, a nonprofit provider of subtitles to Internet videos, noticed something interesting: The site’s most popular video, receiving hundreds of thousands of views every week, was a documentary made 20 years earlier about the Chernobyl nuclear disaster. Originally produced by the New York Academy of Sciences, it was translated into Japanese just days after the earthquake by two people known only by their online handles, paxshalombeagle0728 and witchbabe23.
“It was an amazing example of people in a crisis looking for information. They found this video that was in English and decided that they needed to make it available to people in Japan. It was incredibly important to the community at that moment,” says Reville, who founded Amara, then known as Universal Subtitles, in 2010 with the hope of making precisely these moments of language-unbounded communication possible.
Amara’s staff of 18 doesn’t translate and subtitle the videos. Instead they provide a software platform that makes subtitling easy, and a convenient online gathering place for whoever wishes to do the job. In many ways their work resembles that of the Wikimedia Foundation, providers of the software scaffold on which an army of volunteers have written the world’s largest encyclopedia.
Some 200,000 videos have been subtitled so far into close to 100 languages, from the Chernobyl documentary to online education classes to woman-on-the-street responses at the 2012 Republican National Convention. Amara is also the subtitling choice for Udacity, Coursera, TED-Ed, and the Khan Academy, leading online education companies that promise to bring classes to millions of students over the Internet. As enormous as that effort’s potential seems, it’s positively mind-boggling when we imagine those classes translated into dozens of languages. For that to happen, of course, someone will need to do the translation.
“As we get into a global video environment, we need subtitles in order to peek into each other’s culture and understand what’s happening,” Reville said. “We’re trying to create a Wikipedia for subtitling, a global community that can solve the problem at scale. It’s about enabling all of us.”
Reville, 32 and based in Worcester, Mass., didn’t set out to subtitle videos. He came of age as the Internet transitioned from a textbased, low-bandwidth medium into high-speed ubiquity—and as a communitarian ethos confronted commercial reality. Reville fell squarely in the former camp, part of a movement that would spawn Wikimedia, the Mozilla browser, and other tools expressly intended to ensure that at least some of the Internet’s infrastructure would remain community owned.
In 2005, he helped found the Participatory Culture Foundation, a nonprofit pledged to “enable and support independent, non-corporate creativity and political engagement.” One of the foundation’s projects was Miro, free music- and video-playing software written in what’s known as open source, freely available and nonproprietary code. As Miro matured, however, Reville and his collaborators realized that with short videos becoming standard communications currency, making it easy to watch videos was only part of the challenge.
Sure, some teenage factory worker in the Pearl River Delta could upload a video watched the next day by a grandmother in Kansas—but could Grandma understand what was said? The Internet broke down many boundaries, but language was not always one of them.
Surveying the subtitling landscape, Reville found few options. Commercial services existed, but they were prohibitively expensive at large scale. Some cost tens of dollars per minute. Automated voice-recognition translation programs were also a possibility, but they were, and still are, far too error-prone. If a significant number of videos were to be translated and captioned, realized Reville, the work would need to be done by volunteers. “If you’re trying to do this at scale, it has to be collaborative,” he says. “It has to be driven by the people who care about the video and will work on it.”
Of course, many entrepreneurs and do-gooders would love to harness the power of crowds. Doing so, however, is complicated. “The world is littered with projects that don’t get off the ground because people don’t feel ownership,” says John Lilly, a venture capitalist and former CEO of the Mozilla Foundation who sits on the Participatory Culture Foundation’s board. “Amara’s sensibilities about how communities do work, how you give people recognition and ownership, are very good.”
A WIKI ETHOS
Amara started by designing its service with crowds in mind: Rather than attempting to host videos themselves and convince people to migrate from well-established services like YouTube and Vimeo, the staff would simply make it possible to subtitle videos that could be embedded on any service.
User interface was also a foundational concern. “We looked at the challenges of subtitling and said, ‘What makes it complicated?’ One, the interfaces were really tedious,” says Reville. Whereas other software required translators to stop and start their videos frequently while typing, Amara’s allowed people to type continuously, automatically pausing the video for them if the program sensed they’d fallen too far behind the audio. “We try to keep you typing as much as you can, so you can have a continuous workflow,” Reville says. “The end result is not just more efficient, but more enjoyable. That’s crucial if you’re going to expand.”
Next came the actual business of harnessing a voluntary community. Amara could have attempted to pull together its own army of volunteers—an on-call translation force responding to subtitling demand—but the majority of this kind of translation is done by people who happen to be interested in a particular set of videos or project. “Part of the model is that we’re not just one community. Different organizations will use the platform to build communities around their content,” Reville says.
When PBS NewsHour correspondent Hari Sreenivasan decided to use Amara for subtitling videos during the 2012 presidential campaign, the network invited viewers to help. More than 800 joined, ultimately translating more than 200 videos. “There are people who are really passionate about this,” Sreenivasan says. “They look around the Web and see something that’s inaccessible to a loved one or themselves.” Sreenivasan says that paying to subtitle NewsHour videos would have been prohibitively expensive.
“This was one of those moments that jelled perfectly with public media, with trying to make all this content available to everyone,” Sreenivasan says. “There were opportunities for someone who is Vietnamese and lives in Louisiana to see a predominantly Spanish speaker in Los Angeles express answers to her questions.”
For people unaccustomed to a Wiki ethos, it might seem improbable that high-quality translations could emerge from such ad hoc arrangements. Yet PBS didn’t try to micromanage the process. Instead the network trusted Amara’s quality-control mechanisms, in which translations are reviewed by multiple editors, with the initial text corrected and revised until a consensus version emerges.
“There’s an understanding among the team that copy editing each other’s work is a part of the process,” says Josh Barajas, a PBS NewsHour production assistant who manages the NewsHour Amara community. “Part of my job is to facilitate that communication within the team.”
A similar hands-off approach has worked for online education startup Coursera, which has enrolled hundreds of thousands of students around the world. Some 500 have joined Coursera’s Amara community, subtitling more than 1,000 videos in multiple languages, often within days of a lecture’s online upload. “Because the ability to modify and contribute is so open, it’s easy for any student to go in to update a translation,” says Jiquan Ngiam, Coursera’s engineering director. “If a video doesn’t get completed in time, other students will notice it very quickly.”
Other Amara users, such as TED-Ed, the educational arm of the global conference powerhouse, have adopted more formal routines, with rounds of peer review and trusted moderators who sign off on final subtitle versions. But even that formality is driven by the basic Amara model: a few users performing the bulk of a translation, followed by small revisions from others.
“If there’s a bad set of subtitles, it’s more likely to be caught by other users,” says Reville. He described each community’s natural tendency to “iterate toward quality” in the same way that Wikipedia’s authors produce accurate scholarship. “It sounds like it would be a mess, but it works out really well.”
Like many Internet startups, Amara offers both free and feebased services, with the latter offering users the ability to keep content private, restrict editing to selected users, order commercial-grade subtitling, and upload unlimited numbers of videos. After relying initially for funding on grants from the Knight, MacArthur, Mozilla, and Certa foundations, Amara now gets roughly half of its revenue from its fee-based service. Reville ultimately aims to make the organization self-sufficient, perhaps splitting off a for-profit entity that will fund its free nonprofit service. Mozilla works much the same way, with its nonprofit enterprise supported through Google advertising.
Regardless of what happens, Reville insists that Amara’s core mission will always remain free and nonprofit. “It’s really important that we have some organizations that are not primarily structured around the profit motive, but that are part of the global communications infrastructure,” he says. “When you have a very open infrastructure, when anyone can participate, you see all kinds of amazing things sprout out of that.”