One in 20 new Wikipedia pages seem to be written with the help of AI

(newscientist.com)

23 points | by Brajeshwar 2 days ago ago

14 comments

I really push my company owners to stop using AI for documentation, and add a proud "not by AI" https://notbyai.fyi/ label in our products

AI content is overly verbose, a loss of time

[-]

tokinonagare a day ago

The idea is good but the fact it's a commercial project (charging 99$ for an image at that) is quite debious.

[-]

Version467 a day ago

https://brainmade.org is similar, but free.

kazinator a day ago

AI can take remove verbosity from text if asked.

jackphilson a day ago

i mean this is surely a prompting problem isnt it?

princearthur a day ago

1/20 is less than my guess would have been. It's interesting that Wikipedia is heavily weighted in LLM training corpuses. Eventually there will be a feedback loop not unlike the one among meat popsicles.

How many covers are better than the original? Would we expect that to be possible with AI?

jfengel 12 hours ago

Does it have citations? Real ones?

add-sub-mul-div a day ago

Wikipedia will surely decline like the rest of the internet, but at least any Wikipedia article can be retrieved as of a pre-vandalized timestamp.

a day ago

[deleted]

mistrial9 a day ago

MSFT published a study at least two years ago, where they (outsourced?) engineers to build a language model trained on Wikipedia and others, and then auto-read current Wikipedia pages, and performed edits to "correct" them.. with statistics on correctness and model performance.

Startled, I asked wikimedia engineers on Libera IRC about it and they refused to comment in any way, which was uncharacteristic. It appeared to be manipulative, lawyer-like you might say, on the part of MSFT because "what you gonna do?" The writeup (by outsourced engineers?) bragged of high "correction" scores as if it was a contest, in the style of other LLM papers and studies.

[-]

mistrial9 a day ago

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Reliabil...

jgalt212 2 days ago

Gotta get to 1 in 4. That's where Google claims to be.

[-]

pawelduda 2 days ago

Many news outlets are there too already

AStonesThrow a day ago

OCR is a process that usually incorporates some form of A.I., such as pattern matching, and neural networks. So, the corpus of scanned text in the Internet Archive, Hathi Trust, Google Books, those have already been run through AI-enhancement.

Wikipedians use A.I., especially to identify and revert vandalism, such as the ClueBot line, and similar heuristics are now incorporated into the revision logging.

If more of the headlines, photos, and article text published by news outlets is LLM-generated, then new additions to Wikipedia will be summaries of LLM material already.

Just prior to paywall, the article mentions "starting with AI detection tools", so that's going to be wildly inaccurate from the start.

I'd like to see someone run LLM detection on Good and Featured Articles; what percentage would be flagged there?