The Defense Advanced Research Projects Agency (DARPA) isn't known for thinking small, and DARPA has turned its attention (and budget) to a massive task: developing a set of software engines that can transcribe, translate, and summarize both text and speech without training or human intervention. The program, called the Global Autonomous Language Exploitation (GALE), attempts to address the lack of qualified linguists and analysts who know important languages like Mandarin and Arabic.
When bid solicitations went out last year, they told interested parties that DARPA wanted three separate modules built. The first handles the transcription of spoken languages into text. The second is a translation module that can convert foreign text into English, and the third is a "distillation" engine that can answer questions and summarize information provided by the other two modules. While this technology would certainly be put to use by military personnel in the field, it is really designed for deployment in the US, where analysts are easily overwhelmed by the electronic information gathered by the intelligence community.
Most of this information simply goes untranslated, but if GALE is a success, the US government would have access to transcriptions of foreign broadcast news, talk shows, newspaper articles, blogs, e-mails, and telephone conversations. Even with the translation work done, though, this information would be overwhelming, which is why the distillation engine is such an important component of the product.
Read More