1/20 is less than my guess would have been. It's interesting that Wikipedia is heavily weighted in LLM training corpuses. Eventually there will be a feedback loop not unlike the one among meat popsicles.
How many covers are better than the original? Would we expect that to be possible with AI?
MSFT published a study at least two years ago, where they (outsourced?) engineers to build a language model trained on Wikipedia and others, and then auto-read current Wikipedia pages, and performed edits to "correct" them.. with statistics on correctness and model performance.
Startled, I asked wikimedia engineers on Libera IRC about it and they refused to comment in any way, which was uncharacteristic. It appeared to be manipulative, lawyer-like you might say, on the part of MSFT because "what you gonna do?" The writeup (by outsourced engineers?) bragged of high "correction" scores as if it was a contest, in the style of other LLM papers and studies.
OCR is a process that usually incorporates some form of A.I., such as pattern matching, and neural networks. So, the corpus of scanned text in the Internet Archive, Hathi Trust, Google Books, those have already been run through AI-enhancement.
Wikipedians use A.I., especially to identify and revert vandalism, such as the ClueBot line, and similar heuristics are now incorporated into the revision logging.
If more of the headlines, photos, and article text published by news outlets is LLM-generated, then new additions to Wikipedia will be summaries of LLM material already.
Just prior to paywall, the article mentions "starting with AI detection tools", so that's going to be wildly inaccurate from the start.
I'd like to see someone run LLM detection on Good and Featured Articles; what percentage would be flagged there?
I really push my company owners to stop using AI for documentation, and add a proud "not by AI" https://notbyai.fyi/ label in our products
AI content is overly verbose, a loss of time
The idea is good but the fact it's a commercial project (charging 99$ for an image at that) is quite debious.
https://brainmade.org is similar, but free.
AI can take remove verbosity from text if asked.
i mean this is surely a prompting problem isnt it?
1/20 is less than my guess would have been. It's interesting that Wikipedia is heavily weighted in LLM training corpuses. Eventually there will be a feedback loop not unlike the one among meat popsicles.
How many covers are better than the original? Would we expect that to be possible with AI?
Does it have citations? Real ones?
Wikipedia will surely decline like the rest of the internet, but at least any Wikipedia article can be retrieved as of a pre-vandalized timestamp.
MSFT published a study at least two years ago, where they (outsourced?) engineers to build a language model trained on Wikipedia and others, and then auto-read current Wikipedia pages, and performed edits to "correct" them.. with statistics on correctness and model performance.
Startled, I asked wikimedia engineers on Libera IRC about it and they refused to comment in any way, which was uncharacteristic. It appeared to be manipulative, lawyer-like you might say, on the part of MSFT because "what you gonna do?" The writeup (by outsourced engineers?) bragged of high "correction" scores as if it was a contest, in the style of other LLM papers and studies.
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Reliabil...
Gotta get to 1 in 4. That's where Google claims to be.
Many news outlets are there too already
OCR is a process that usually incorporates some form of A.I., such as pattern matching, and neural networks. So, the corpus of scanned text in the Internet Archive, Hathi Trust, Google Books, those have already been run through AI-enhancement.
Wikipedians use A.I., especially to identify and revert vandalism, such as the ClueBot line, and similar heuristics are now incorporated into the revision logging.
If more of the headlines, photos, and article text published by news outlets is LLM-generated, then new additions to Wikipedia will be summaries of LLM material already.
Just prior to paywall, the article mentions "starting with AI detection tools", so that's going to be wildly inaccurate from the start.
I'd like to see someone run LLM detection on Good and Featured Articles; what percentage would be flagged there?