A challenge pits an established writer against ChatGPT for the first time: “At some point I started to get nervous” | Technology

0
57

Each opponent had to invent 30 movie titles. The two then had to write about 600 words with each title and would be evaluated by a panel of six critics and academics. One contender was 48-year-old Argentine writer Patricio Pron. The other was the most advanced language model at the time of testing, ChatGPT-4 Turbo.

“These duels have a lot of tradition in artificial intelligence, like Kasparov against DeepBlue or AlphaGo against Lee Sedol,” says Julio Gonzalo, professor at UNED and one of the authors of the experiment. For the writer, the task was somewhat more delicate. Did you feel the burden of defending humanity against the machine on your shoulders? It wasn’t just about winning or losing, it was also about submitting to a detailed and numerical evaluation, rare in the world of letters. “We found it very funny to imagine myself carrying the destiny of humanity on my back,” says Pron. “I didn’t have much in mind about previous duels like Kasparov’s, but I did remember that the machine had won. So at some point I started to get nervous. At first I accepted with great enthusiasm, but then I began to slightly feel the pressure, not from the weight of humanity, but perhaps from discovering that I am not as good as the machine. “I began to wonder about the fate of my books when it was discovered that I could not even defeat a kind of stochastic parrot that repeats the nonsense that people tell it,” he adds.

Luckily for Pron, the results were overwhelming. It won in all the expected categories, especially in creativity and own voice, but also in original and attractive style. Just by looking at the titles it is easy to understand the difference that exists today between a writer and the best language model. These are some proposals from Pron: After everything I almost did for you, Mental illness three days a week, The Lego Woman and Choose any card. No, not that one, another one. Here are some ChatGPT titles: Fragments of an invisible yesterday, The inverted city, The forgotten melody, The last flight of the butterfly and Footprints in the sea of ​​sandAll the texts will be published, with a new prologue and epilogue, in a book that the Delirio publishing house will publish this year.

Was this victory of human creativity predictable? Quite yes, but that does not mean that ChatGPT is not creative. “It has been proven that AI can be creative: AlphaGo invented new strategies to play Go, which have since been imitated by all the masters. But the field of art is very different from that of a board game,” says Gonzalo. Although the result was not so clear: “There are people who are surprised, also academics, even in my sector (natural language processing). Nobody had done it at this level of a top writer,” says Gonzalo. It also had an influence that the jury were specialists in literature: “In reality, they are titles that do not sound bad, they are the ones you find when you go to the area of ​​​​arts and sciences.” bestsellers from El Corte Inglés,” says Gonzalo.

There are a lot of details that are important in the experiment. In a previous work, the professor of the University of A Coruña Carlos Gómez Rodríguez asked several models to write a fight between the protagonist of the novel The conjuing of the ceciuos and a pterodactyl. The result is much more even: “It has been proven that at least under some particular conditions, AI can write stories as good as a human,” says Gómez Rodríguez. “But there are two nuances. One, it depends a lot on the conditions of the task (language, genre or length), and two, if we compare them with a prominent writer like Patricio Pron, they are still far behind.”

English also ahead

The experiment had a second objective: to see the distance in quality between ChatGPT in English and Spanish. ChatGPT also made its creations in English, which scored 30% better than in Spanish. The experiment received public funding from the Odesia project, framed within the National AI Strategy.

These types of challenges prove that the difference between training the models in different languages ​​is notable: “For simple things, like answering an easy question, it is normal that we do not notice the difference between asking ChatGPT in Spanish or in English. But when trying more complicated things is when you notice the difference, and this is a clear example,” explains Gómez Rodríguez.

Ever since ChatGPT appeared, it has been perceived as a threat to creative work. But experiments like this show that for now it is above all a tool that depends a lot on who and how the request is written: ChatGPT produced better stories with Pron’s titles than with its own titles. In other words, the more original the request, the more creative ChatGPT was.

The authors precisely wanted to avoid giving this initial advantage to the machine, which had to wake up on its own. The objective was to evaluate it as such, not to adjust the request until what they wanted came out. “We were very careful that the competition was on equal terms for both of us,” says Gonzalo. “We had to assume that the machine was capable of interpreting our request and solving it without retouching it, because if not it was a way to start doing co-creation,” he adds.

The ceiling of creativity

A reasonable question is whether the next models will improve this specific capacity or the models by definition have this ceiling. Pron is clear that there is not much to do: “There is nothing creative in the way ChatGPT works. Furthermore, the machine already seems to be good enough for the people who use it. Technology tends to promise us that a camel will fit through the eye of a needle, but most of the time it just passes a camel hair or two and makes us believe that’s all there is. ChatGPT will become the standard in written communication, but only because the variety, the diversity of the world, irritates many people and fills them with fear and doubt. They prefer to concentrate on thinking that the hair is a camel. And ChatGPT can give them that now.”

This possible artistic limitation also has a technical explanation for now. First, these highly sophisticated machines work with probabilities. Their goal is to imitate human text. The most common example is if we give “the sky is”, the machine will tend to continue with “blue”, says Guillermo Marco, professor at UNED and co-author of the article: “Because of this fact, it moves away from the way we create, which are sequences of texts that have a low probability but a deep meaning. If we take less probable words, ChatGPT moves away from the meaning and starts generating junk text”, explains Marco.

This tendency towards homogeneity has another problem with creation: it is important who is the sender of the message. “Art is a communication process,” says Gonzalo. “The receiver interprets the message based on his own context and the expectations of the sender. The same poem will resonate very differently if the reader thinks it comes from a machine than if it comes from a writer mortally wounded in a dawn duel outside Florence. We humans understand art as the artist’s way of communicating emotions to us, and we know that the purpose of the machine is only to please us,” he adds. In a previous experiment by the same authors, with a model long before ChatGPT, synopses invented by machines were rated worse when the jury knew that their author was a machine.

Another avenue that the authors want to explore is what happens when the evaluation is not by specialists, but by the general public, with conventional readers. With the same texts, they believe that the results can be different. Teresa Mateo-Girona, a professor at the Complutense University and also a co-author, explains why and gives an idea of ​​how ChatGPT can work for many artistic purposes that are not as specific as this experiment: “First, an expert detects commonplaces, lack of originality. A person with less experience can find any literary motif that is not familiar to them surprising. Second, an expert tries to evaluate professionally, tries to look for stylistic features, of the plot, that generate interest, compared to a non-specialized reader who could base it more on the personal, which would make it more variable. And three, the style can influence the understanding of the texts. Compared to the simple and understandable ChatGPT texts, the most complex and rich writing of a writer can be appreciated by experts, but difficult to understand for a common reader,” explains Mateo-Girona.

Even for co-creation it is a delicate tool. In another article done with digital artists it was seen that when they used ChatGPT they were able to generate more attractive art for the community, with more likes. “But diversity dropped a lot, in the end it became uniform. It’s like a teacher at a certain school, the school of maximum probability,” summarizes Marco.

You can follow THE COUNTRY Technology in Facebook and X or sign up here to receive our weekly newsletter.

Subscribe to continue reading

Read without limits

_