Скопировано

An analogue of ChatGPT in the Kazakh language has been developed

17.07.2024 10:54:00
Дата публикации
The first open-source national language model, IrbisGPT, appeared in Kazakhstan.

The developers published an official release on Habré, giving all users the opportunity to test this model and contribute to its training.

IrbisGPT is a public non-profit initiative developed in collaboration with MOST Holding and the Gen2b.ai studio, specializing in the use of AI in business.

The goal of the project is the preservation and dissemination of the Kazakh language, as well as its integration into modern digital technologies for the development of society, economy and science in Kazakhstan.

“This is a pioneering project in the field of development of the Kazakh language through the use of artificial intelligence. We hope that IrbisGPT will help in protecting and promoting the state language,” said project founder Bakht Niyazov.

According to the developers, the current version of IrbisGPT demonstrates excellent learning potential.

Unlike other open source models that either answer in English or generate random words in Kazakh, IrbisGPT answers questions without context in a detailed and correct manner.

Thus, IrbisGPT gives an answer in the state language to the question “Shop nege zhasyl?” (“Why is the grass green?”), knows who the president of Kazakhstan is, how many days there are in the year, and can even philosophize about the meaning of life.

To train the model, the developers used 20 gigabytes of “raw” data from news and articles in the Kazakh language, which expanded the vocabulary almost three times.

However, the team recognizes that this is not enough and relies on the provision of quality data from government agencies to further improve IrbisGPT.

The final dictionary of the tokenizer (text-to-data converter) contains more than 60 thousand tokens. The team also has a plan to create a more advanced model architecture that will be useful in different industries.

“We look forward to closer cooperation with government agencies, civil society and the private sector,” said Gen2b.ai CEO Armen Atayan.

The development of IrbisGPT opens up new opportunities for promoting the Kazakh language in the digital environment and its use in various spheres of life. And the contribution of every enthusiast and developer will help make the model even more effective, the creators of IrbisGPT are confident.


(text translation is carried out automatically)