DeepZang, the world’s first large language model for Tibetan, debuted in Lhasa, marking a new step in bringing the Tibetan language into the AI era.
The world’s first large language model designed for the Tibetan language, DeepZang, was released in Lhasa on March 15, according to CNS.
DeepZang is China’s first Tibetan-language large language model to complete national registration for generative artificial intelligence. The project fills a global technological gap in this field.
The launch took place during an event where DeepZang and its related applications were introduced. At the event, DeepZang founder Tenzin Norbu said the open-source model is China’s first AI platform designed for ethnic languages, supporting multiple languages and multimodal functions.
The platform supports services in more than 80 languages. It can listen, speak, translate, process images and analyse information. Meanwhile, the DeepZang app was also released. It will soon be used in several industries to create a range of “smart+” solutions.
According to Tenzin Norbu, the company began planning a “Tibetan-Chinese bilingual plus AI” strategy in 2018. Over the next four years, the team built a high-quality Tibetan-Chinese parallel corpus with nearly 70 million entries.
In addition, large-scale voice data were collected across three major Tibetan dialect regions. The database includes about 10,500 hours of speech from Ü-Tsang, 10,000 hours from Kham, and 10,000 hours from Amdo. As a result, it is currently China’s largest Tibetan speech database with relatively precise annotations.
Finally, the model received recognition from the World Record Certification Agency during the launch event. The organisation awarded a certificate confirming DeepZang as the world’s first Tibetan large language model.
If you liked this article, why not read: Through the Eyes of a British Educator: Xizang’s Transformation