The AI tool Chat-GPT has stunned the world with its good language use. Joakim Nivre, Professor of Computational Linguistics at Uppsala University explained how the language model became so good at Swedish. For the past two years, he has been involved in developing language models based on Swedish texts.
“The idea of building language models has been around for a long time, at least since the 1950s,” says Joakim Nivre. Claude Shannon, known as the “father of information theory”, realised that you could measure the amount of information in language by guessing the next word in a text. The more difficult it was to guess the next word, the more information there was in the text.
By having a computer model try to guess the next word and giving a feedback signal on how good it is, the model can be trained. If it is good enough at guessing the next word, it has also learned something about the language.
“Since the 1950s, it has been possible to scale this up and make it so much more powerful. The statistical probability models have billions of different parameters. Moreover, they can be trained on an incredible number of text types, maybe even consisting of trillions of words.”
In autumn 2022, Chat-GPT, a language model developed by the company Open AI, was released with surprisingly good language capabilities. At that time, Joakim Nivre, together with researchers at AI Sweden and RISE, had already started building Swedish language models.
“Within the project, we have trained several models of different sizes, the largest of which has 40 billion parameters. This is about a quarter of what GPT-3 (the predecessor of Chat-GTP) has and about a tenth of the largest models. This is one of the largest models available for a language other than English and Chinese.”
But there is no denying it. Chat-GPT and its successor GPT 4 are superior. Not only in English, but also in their use of Swedish, especially when it comes to providing relevant answers to questions.
Read the full article.