Bookmarked Article

Meta launches a powerful single-model A.I. tool for transcription and translation


On August 22, Meta launched SeamlessM4T, a powerful A.I. model that can translate and transcribe more than 100 languages.


According to Meta’s blog post, the model’s capability “has long been dreamed of in science fiction.” The launch encapsulates the company’s mission to “[bring] the world closer together with a foundational multimodal model for speech translation”.


As a foundational multilingual and multitask model, SeamlessM4T supports 5 main functions:


  • Automatic speech recognition for nearly 100 languages
  • Speech-to-text translation for nearly 100 input and output languages
  • Speech-to-speech translation, supporting nearly 100 input languages and 35 output languages
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation, supporting nearly 100 input languages and 35 output languages

Apart from some of the most spoken languages such as English, Spanish, and German, SeamlessM4T also applies to some dialects without a widely used writing system such as Hokkien.


The result is due to the research team’s model training using tens of billions of publicly available sentences mined from the web and 4 million hours of speech, which is also the largest open speech-to-speech and speech-to-text parallel corpus in total volume and language coverage to date.


To ensure accuracy while avoiding mistranscribed and toxic outputs, Meta claims that the development of SeamlessM4T was guided by the company’s “five pillars of Responsible A.I.”


“This is only the latest step in our ongoing effort to build AI-powered technology that helps connect people across languages. In the future, we want to explore how this foundational model can enable new communication capabilities — ultimately bringing us closer to a world where everyone can be understood,” Meta wrote.