How Wikipedia Survives in the Age of ChatGPT | Technology

There has always been a risk of fake articles appearing on Wikipedia. Just one example: for a time, the page that told the biography of a Northern Irish radio presenter stated that he had been a promise of the break dance and that his career in urban dance was cut short due to a spinal injury. But all this was pure trolling. Other times, however, there are promotional or disinformation purposes behind it. The portal already has a long tradition of addressing these types of problems. A committed community of 265,000 active volunteers has kept them in check so far. But the explosion of AI-generated texts poses new challenges.

With more than 16 billion visits per month, Wikipedia’s prestige is beyond question. That’s why it’s a coveted place to inject disinformation or to whitewash marketing messages from companies or individuals. And with artificial intelligence (AI), credible texts can be generated at will, easily and effortlessly.

Following the launch of ChatGPT, the portal expanded its team dedicated to machine learning. Wikipedia co-founder Jimmy Wales has stated that AI is both “an opportunity and a threat.” And in its latest fundraising campaign, one of the claims highlighted the platform’s role in the “age of artificial intelligence.”

Miguel Ángel García, a Wikimedia Spain partner and former board member, admits that he has already encountered texts that are suspected of having been generated with AI. “We have noticed that new editors appear who want to add content. And they add very extensive and highly developed content, which is unusual. Because when you are a volunteer starting out, you build the articles little by little. You go paragraph by paragraph.”

García knows these patterns well. He started contributing to Wikipedia in 2006, when he was in high school. He would correct the occasional spelling mistake or make obvious grammatical changes. He created his first article because he had written a paper about his parents’ village, Campaspera, near Valladolid. There was no information about this town on the site, so he uploaded his text with photos he had taken himself.

“Since artificial intelligence has existed, more and more volunteers appear who give you a giant text, apparently well structured and well developed. But then you read it and discover the redundancies that a person is often able to detect in texts made with artificial intelligence,” García refers to taglines and a certain way of presenting information, with hackneyed introductions and conclusions.

Such texts risk getting lost in an ocean of more than 62 million articles in more than 300 languages. Chris Albon, director of Machine Learning at the Wikimedia Foundation, which controls Wikipedia, points out that since 2002 some volunteers have used AI tools, especially in redundant tasks. Technology is no stranger to them. And the key to controlling inappropriate texts lies precisely in the community of volunteers, who moderate the content. They not only write texts, they also edit them and discriminate which ones may not be valuable.

“In this new era of artificial intelligence, the strength of this human-led content moderation model is more relevant. Wikipedia’s model, based on debate, consensus and strict citation rules (of sources), has proven resilient in maintaining content quality over the past two decades,” says Albon. All text must be referenced with secondary sources, which are links to pages on other websites.

Suspicious surge when ChatGPT was born

If an article has no sources, the community of volunteers detects this and acts. “In most cases, articles are deleted instantly, because with just two clicks you can detect that the text is completely pointless. If not, they are usually marked to be automatically deleted within a maximum period of 30 days if the author is unable to prove what is written with sources,” explains García.

The Wikimedia Spain partner says that when ChatGPT emerged, there was a peak of AI-generated texts that were uploaded to the portal. But now the trend has stabilised thanks to the efforts of the community. For his part, Albon says that we have to learn to live with these tools. “Wikipedia’s approach to AI has always been that people edit, improve and audit the work that AI does. Volunteers create the policies for the responsible use of AI tools on Wikipedia and monitor their correct application,” he reflects. The portal does not penalise the use of artificial intelligence in texts, but rather those that do not meet the quality required by its policies.

According to García, the biggest risk for Wikipedia is outside of Wikipedia. The platform relies on secondary sources. “I see a medium-term problem in relation to possible AI-generated texts that become apparently reliable sources in the real world. More and more digital newspapers are emerging that publish almost anything. There comes a point where there are people who want to reference texts with these pseudo-media,” he points out.

The solution, like almost everything on the platform, lies with the editors. If volunteers detect that a site is unreliable, the community can decide to blacklist it. This happened with a medium as established as the tabloid Daily Mail. A few years ago, its use as a source was banned because the British tabloid had published repeated unverified information.

Wikipedia’s dance with AI chats

There is another concern regarding the future of Wikipedia in this era of artificial intelligence. In a hypothetical scenario where chatbots, such as ChatGPT or Google Gemini, resolve user queries with a summary, who will visit Wikipedia articles? And more so, who will edit them?

“If there is a disconnect between where knowledge is generated, such as on Wikipedia, and where it is consumed, such as on ChatGPT, we risk losing a generation of volunteers,” Albon reasons.

Connect the sites that have the knowledge with the chatbots The use of AI, which extracts and replicates it, is also of general interest. “Without clear attribution and links to the original source from which the information was obtained, AI applications risk introducing an unprecedented amount of misinformation into the world. Users will not be able to easily distinguish between accurate information and hallucinations. We have thought a lot about this challenge and believe that the solution is attribution,” comments Wikimedia’s Director of Machine Learning.

The timing is ironic. Because, as we know, applications like ChatGPT or Google Gemini are based on systems that have been trained on Wikipedia content. Thus, part of the knowledge acquired by large language models (LLM) comes from those millions and millions of articles uploaded and edited by volunteers.

You can follow THE COUNTRY Technology in Facebook and X or sign up here to receive our weekly newsletter.

Hot this week

Happy Birthday Wishes, Quotes, messages, Facebook WhatsApp Instagram status, images and pics (Updated)

From meaningful Birthday greeting pics to your family and friends. happy birthday images, happy birthday gif, happy birthday wishes, happy birthday in spanish happy birthday meme, belated happy birthday, happy birthday sister, happy birthday gif funny, happy birthday wishes for friend

150+ Birthday Quotes, Wishes and Text Messages for Friends and Family (Updated)

Whatsapp status, Instagram stories, Facebook posts, Twitter Tweet of Birthday Quotes, Wishes and Text Messages for Friends and Family It is a tradition to send birthday wishes and to celebrate the occasion.

Merry Christmas Wishes, messages, Facebook WhatsApp Instagram status, images and pics | theusaprint.com

Merry Christmas 2024: Here are some wishes, messages, Facebook, WhatsApp and Instagram stats and images and pictures to share with your family, friends.

Vicky López: from her signing on the beach of Benidorm to making her senior debut at 17 years old | Soccer | ...

“Do you play for Rayo Vallecano?” that nine-year-old girl...

Related Articles

Popular Categories