Internal representations of LLMs encode information about truthfulness

(arxiv.org)

105 points | by benocodes 3 hours ago ago

79 comments

og_kalu 2 hours ago

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824

The Internal State of an LLM Knows When It's Lying - https://arxiv.org/abs/2304.13734

LLMs Know More Than What They Say - https://arjunbansal.substack.com/p/llms-know-more-than-what-...

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

[-]

zbentley 2 hours ago

And the other excellent work done by the Bau Lab in this area: https://rome.baulab.info/

foobarqux 2 hours ago

It's wild that people post papers that they haven't read or don't understand because the headline supports some view they have.

To wit, in your first link it seems the figure is just showing the trivial fact that the model is trained on the MMLU dataset (and after RLHF it is no longer optimized for that). The second link main claim seems to be contradicted by their Figure 12 left panel which shows ~0 correlation between model-predicted and actual truth.

I'm not going to bother going through the rest.

I don't yet understand exactly what they are doing in the OP's article but I suspect it also suffers from serious problems.

[-]

bee_rider 44 minutes ago

They just posted a list of articles, and said that they were related. What view do you think they have, that these papers support? They haven’t expressed a view as far as I can see…

Maybe you’ve inferred some view based on the names of the titles, but in that case you seem to be falling afoul of your own complaint?

[-]

Retr0id 14 minutes ago

Much like you can search the internet until you find a source that agrees with you, you can select a set of papers that "confirm" a particular viewpoint, especially in developing fields of research. In this case, the selected papers all support the view LLMs "know what they know" on some internal level, which iiuc is not (yet?) a consensus viewpoint (from my outsider perspective). But from the list alone, you might get that impression.

Still no lie detector for language models: probing empirical and conceptual roadblocks - https://link.springer.com/article/10.1007/s11098-023-02094-3

Hallucination is Inevitable: An Innate Limitation of Large Language Models - https://arxiv.org/abs/2401.11817

LLMs Will Always Hallucinate, and We Need to Live With This - https://arxiv.org/abs/2409.05746

(disclaimer, I also have not read any of these papers beyond the title!)

foobarqux 37 minutes ago

Search the poster's history for those links where their view is explicitly expressed.

[-]

erikerikson 30 minutes ago

Responding to a poster's history of posts rather than the post you are responding to seems problematic.

[-]

tsimionescu 26 minutes ago

If you have discussed those things previously with the poster, I don't agree. If you were to go digging through their history only to respond to the current comment, that's more debatable. But, we're supposed to assume good faith here on HN, so I would take the first explanation.

[-]

erikerikson 6 minutes ago

In this case the poster seems to have projected opinions on to a post where none were expressed. That seems problematic regardless of how they came to associate the opinions with their respondent. Maybe the poster they responded to still hold the projected opinions, perhaps that poster abandoned the projected opinions, or perhaps they thought the projected opinions distracting and resultantly chose not to share.

If I am wrong or not useful in my posts, I would hope to be allowed to remove what was wrong and/or not useful without losing my standing to share the accurate, useful things. Anything else seems like residual punishment outside the appropriate context.

og_kalu an hour ago

>It's wild that people post papers that they haven't read or don't understand because the headline supports some view they have.

It's related research either way. And I did read them. I think there's probably issues with the methodology of 4 but it's there anyway because it's interesting research that is related and is not without merit.

>The second link main claim seems to be contradicted by their Figure 12 left panel which shows ~0 correlation between model-predicted and actual truth.

The panel is pretty weak on correlation but it's quite clearly also not the only thing that supports that particular claim neither does it contradict it.

>I'm not going to bother going through the rest.

Ok? That's fine

>I don't yet understand exactly what they are doing in the OP's article but I suspect it also suffers from serious problems.

You are free to assume anything you want.

[-]

foobarqux an hour ago

> The panel is pretty weak on correlation but it's quite clearly also not the only thing that supports that particular claim neither does it contradict it.

It very clearly contradicts it: There is no correlation between the predicted truth value and the actual truth value. That is the essence of the claim. If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".

[-]

godelski 25 minutes ago

To be fair, I'm not sure people writing papers understand what they're writing either. Much of the ML community has seemed to fully embraced "black box" nature rather than seeing it as something to overcome. I routinely hear both readers and writers tout that you don't need much math. But yet mistakes and misunderstand are commonplace and they're right, they don't need much math. How much do you need to understand the difference between entropy and perplexity? Is that more or less than what's required to know the difference between probability and likelihood? I would hope we could at least get to a level where we understand the linear nature of PCA

[-]

foobarqux 17 minutes ago

LLMs have spawned endemic academic dishonesty in order to pad publication and citation counts.

og_kalu 37 minutes ago

>If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".

Not every internet conversation need end in a big debate. You've been pretty rude and i'd just rather not bother.

You also seem to have a lot to say on how much people actually read papers but your first response also took like 5 minutes. I'm sorry but you can't say you've read even one of those in that time. Why would i engage with someone being intellectually dishonest?

[-]

godelski 16 minutes ago

  > you can't say you've read even one of those in that time.

I'm not sure if you're aware, but most of those papers are well known. All the arxiv papers are from 2022 or 2023. So I think your 5 minutes is pretty far off. I for one have spent hours, but the majority of that was prior to this comment.

You're claiming intellectual dishonestly too soon.

That said, @foobarqux, I think you could expand on your point more to clarify. @og_kalu, focus on the topic and claims (even if not obvious) rather than the time

foobarqux 36 minutes ago

> I guess i understand seeing as you couldn't have read the paper in the 5 minutes it took for your response.

You've posted the papers multiple times over the last few months, so no I did not read them in the last five minutes though you could in fact find both of the very basic problems I cited in that amount of time.

[-]

og_kalu 25 minutes ago

Still intellectually dishonest and just plain weird. If you've come upon the previous posts organically, why not address it then? and Why act like it's the first time here ?

I'm even less willing to engage.

[-]

foobarqux 20 minutes ago

Because it's pointless to reply to a comment days after it was made. All of this is a convenient misdirection for not having read and understood the papers you keep posting because you like the headlines.

mattnewton an hour ago

Please keep the conversation in good faith.

lsy 2 hours ago

There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations. If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous. The question then becomes “for what purposes (if any) are the models profitable, even if they occasionally hallucinate?” Whoever solves that problem walks away with the market.

[-]

justinpombrio 2 hours ago

> If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

This isn't true.

You're conflating whether a model (that hasn't been fine tuned) would complete "the capital of Connecticut is ___" with "Moscow", and whether that model contains a bit labeling that fact as "false". (It's not actually stored as a bit, but you get the idea.)

Some sentences that a model learns could be classified as "trivia", and the model learns this category by sentences like "Who needs to know that octopuses have three hearts, that's just trivia". Other sentences a model learns could be classified as "false", and the model learns this category by sentences like "2 + 2 isn't 5". Whether a sentence is "false" isn't particularly important to the model, any more than whether it's "trivia", but it will learn those categories.

There's a pattern to "false" sentences. For example, even if there's no training data directly saying that "the capital of Connecticut is Moscow" is false, there are a lot of other sentences like "Moscow is in Russia" and "Moscow is really far from CT" and "people in Moscow speak Russian", that all together follow the statistical pattern of "false" sentences, so a model could categorize "Moscow is the capital of Connecticut" as "false" even if it's never directly told so.

[-]

RandomLensman an hour ago

That would again be a "statistical" attempt at deciding on it being correct or false - it might or might not succeed depending on the data.

[-]

whimsicalism a minute ago

Good luck philosophically defending this line

dboreham 21 minutes ago

The human feeling you have that what you're doing is not statistical, is false.

[-]

RandomLensman 10 minutes ago

Based on what research is that universally true? (Other than base physics like statistical mechanics.)

[-]

whimsicalism 2 minutes ago

Base physics is all we need to know it is true. Souls are unphysical and we've had reason to be pretty confident about that for at least a century.

justinpombrio an hour ago

That's correct on two fronts. First, I put "false" in quotes everywhere for a reason: I'm talking about the sort of thing that people would say is false, not what's actually false. And second, yes, I'm merely claiming that it's in theory learnable (in contrast to the OP's claim), not that it will necessarily be learned.

[-]

RandomLensman an hour ago

Am not sure the second part is always true: there might be situations where statistical approaches could be made kind of "infinitely" accurate as far as data is concerned but still represent a complete misunderstanding of the actual situation (aka truth), e.g., layering epicycles on epicycles in a geocentric model of the solar systems.

Some data might support a statistical approach other might not even though it might not contain misrepresentations as such.

slt2021 an hour ago

but the model doesn't operate on token directly, right? all operations are happening in the embedding space, so these tokens get mapped into manifold and one of the dimensions could be representative of fact/trivia ?

[-]

ottaborra an hour ago

tangent: any reason to assume it gets mapped to a manifold rather than something that is not?

vunderba 40 minutes ago

Agree, humans can "arrive at a reasonable approximation of the truth" even without the direct knowledge of the capital of Connecticut. A human has some other interesting data points that allow them to probabilistically guess that the capital of Connecticut is not Moscow and those might be things like:

- Moscow is a Russian city, and they probably aren't a lot of cities in the US that have strong Russian influences especially in the time when these cities might have been founded

- there's a concept of novelty in trivia, whereby the more unusual the factoid, the better the recall of that fact. If Moscow were indeed the capital of Connecticut, it seems like the kind of thing I might've heard about since it would stand out as being kind of bizarre.

Noticeably this type of inference seems to be relatively distinct from what LLMs are capable of modeling.

[-]

int_19h 14 minutes ago

I was actually quite surprised at the ability of top-tier LLMs to make indirect inferences in my experiments.

One particular case was an attempt to plug GPT-4 as a decision maker for certain actions in a video game. One of those was voting for a declaration of war (all nobles of one faction vote on whether to declare war on another faction). This mostly boils down to assessing risk vs benefits, and for a specific clan in a faction, the risk is that if the war goes badly, they can have some of their fiefs burned down or taken over - but this depends on how close the town or village is to the border with the other faction. The LM was given a database schema to query using SQL, but it didn't include location information.

To my surprise, GPT-4 (correctly!) surmised in its chain-of-thought, without any prompting, that it can use the culture of towns and villages - which was in the schema - as a sensible proxy to query for fiefs that are likely to be close to the potential enemy, and thus likely to be lost if the war goes bad.

genrilz 2 hours ago

There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow. LLMs of course don't have the benefit of direct experience, which would probably help them to at least some extent hallucination wise.

I think research about hallucination is actually pretty valuable though. Consider that humans make mistakes, and yet we employ a lot of them for various tasks. LLMs can't do physical labor, but an LLM with a low enough hallucination rate could probably take over many non social desk jobs.

Although in saying that, it seems like it also might need to be able to learn from the tasks it completes, and probably a couple of other things too to be useful. I still think the highish level of hallucination we have right now is a major reason why they haven't replaced a bunch of desk jobs though.

[-]

vacuity an hour ago

I think we expect vastly different things from humans and LLMs, even putting raw computing speed aside. If an employee is noticed to be making a mistake, they get reprimanded and educated, and if they keep making mistakes, they get fired. Having many humans interact helps reduce blind spots because of the diversity of mindsets, although this isn't always the case. People can be hired from elsewhere with some level of skill.

I'm sure we could make similar communities of LLMs, but instead we treat a task as the role of a single LLM that either succeeds or fails. As you say, perhaps because of the high error rate, the very notion of LLM failure and success is judged differently too. Beyond that, a passable human pilot and a passable LLM pilot might have similar average performance but differ hugely in other measurements.

[-]

genrilz an hour ago

Overall, excellent points! I would like to add on to that though. RLHF actually does effectively have one LLM educating another. Specifically, human trainer's time is valuable, so they train an AI to express the same opinion about some response as a human trainer would, and then have that trainer AI train the LLM under consideration.

It's both interesting and sensible that we have this education in the training phase but not the usage phase. Currently we don't tend do any training once the usage phase is reached. This may be at least partially because over-training models for any special purpose task (including RLHF) seems to decrease performance.

I wonder how far you could get by learning from retraining from some checkpoint each time with some way to gradually increase the quality of the limited quantity training data being feed. The newer data could come from tasks the model completed, along with feedback on performance from a human or other software system.

Someone's probably already done this though. I'm just sitting in my armchair here!

beezlebroxxxxxx 2 hours ago

> There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow.

Isn't this just conveniently glossing over the fact that you weren't taught that. It's not a "model of truthfulness", you were taught facts about geography and you learned them.

[-]

genrilz 2 hours ago

I mean, sure. OP implied that "capital of Connecticut is Moscow" is the sort of thing that a human "model of truthfulness" would encode. I'm pointing out that the human model of that particular fact isn't inherently any more truthy than the LLM model.

I am saying that humans can have a "truther" way of knowing some facts through direct experience. However there are a lot of facts where we don't have that kind of truth, and aren't really on any better ground than an LLM.

FeepingCreature 2 hours ago

How exactly can there be "truthfulness" in humans, say? After all, if a human was taught in school all his life that the capital of Connecticut is Moscow...

[-]

stoniejohnson 2 hours ago

Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

Inferring truth about a social event in a social situation, for example, requires a nuanced set of thought processes and attention mechanisms.

If we had a swarm of LLMs collecting a variety of data from a variety of disparate sources, where the swarm communicates for consensus, it would be very hard to convince them that Moscow is in Connecticut.

Unfortunately we are still stuck in monolithic training run land.

[-]

FeepingCreature an hour ago

> Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

> The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

Great! LLMs are fed from the same swarm.

[-]

stoniejohnson 6 minutes ago

I was responding to the back and forth of:

> If you pretrained an LLM with data saying Moscow is the capital of Connecticut it would think that is true.

> Well so would a human!

But humans aren't static weights, we update continuously, and we arrive at consensus via communication as we all experience different perspectives. You can fool an entire group through propaganda, but there are boundless historical examples of information making its way in through human communication to overcome said propaganda.

genrilz 2 hours ago

We kinda do have LLMs in a swarm configuration though. Currently LLMs training data, which includes all of the non RAG facts they know, come from the swarm that is humans. As LLM outputs seep into the internet, older generations effectively start communicating with newer generations.

This last bit is not a great thing though, as LLMs don't have the direct experience needed to correct factual errors about the external world. Unfortunately we care about the external world, and want them to make accurate statements about it.

It would be possible for LLMs to see inconsistencies across or within sources, and try to resolve those. If perfect, then this would result in a self-consistent description of some world, it just wouldn't necessarily be ours.

[-]

stoniejohnson 2 minutes ago

I get where you are coming from, and it is definitely an interesting thought!

I do think it is an extremely inefficient way to have a swarm (e.g. across time through training data) and it would make more sense to solve the pretraining problem (to connect them to the external world as you pointed out) and actually have multiple LLMs in a swarm at the same time.

ben_w 2 hours ago

Even monolithic training runs take sources more disparate than any human has the capacity to consume.

Also, given the lack of imagination everyone has with naming places, I had to check:

https://en.wikipedia.org/wiki/Moscow_(disambiguation)

[-]

stoniejohnson 4 minutes ago

I was responding to the idea that an LLM would believe (regurgitate) untrue things if you pretrained them on untrue things. I wasn't making a claim about SOTA models with gigantic training corpora.

RandomLensman an hour ago

There isn't necessarily in humans either, but why build machines that just perpetuate human flaws: Would we want calculators that miscalculate a lot or cars that cannot be faster than humans?

[-]

og_kalu an hour ago

What exactly do you imagine is the alternative ? To build generally intelligent machines without flaws ? Where does that exist ? In...ah that's right. It doesn't except in our fiction and in our imaginations.

And it's not for a lack of trying. Logic cannot even handle Narrow Intelligence that deals with parsing the real world (Speech/Image Recognition, Classification, Detection etc). But those are flawed and mis-predict so why build them ? Because they are immensely useful, flaws or no.

[-]

RandomLensman 44 minutes ago

Why should there not be, for example, reasoning machines - do we know there is no universal method for reasoning?

Having deeply flawed machines in the sense that they perform their tasks regularly poorly seems like an odd choice to pursue.

ben_w 2 hours ago

I agree that humans and AI are in the same boat here.

It's valid to take either position, that both can be aware of truth or that neither can be, and there has been a lot of philosophical debate about this specific topic with humans since well before even mechanical computers were invented.

Plato's cave comes to mind.

juliushuijnk 2 hours ago

You are not disproving the point.

[-]

recursive an hour ago

If truthfulness doesn't exist at all, then it's meaningless to say that LLMs don't have any data regarding it.

wiremine 2 hours ago

> There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations.

I think there are two issues here:

1. The "truthfulness" of the underlying data set, and 2. The faithfulness of the LLM to pass along that truthfulness. Lack of passing along the truthfulness is, I think, the definition of the hallucination.

To your point, if the data set if flawed or factually wrong, the model will always produce the wrong result. But I don't think that's a hallucination.

[-]

not2b an hour ago

The most blatant whoppers that Google's AI preview makes seem to stem from mistaking satirical sites for sites that are attempting to state facts. Possibly an LLM could be trained to distinguish sites that intend to be satirical or propagandistic from news sites that intend to report accurately based on the structure of the language. After all, satirical sites are usually written in a way that most people grasp that it is satire, and good detectives can often spot "tells" that someone is lying. But the structure of the language is all that the LLM has. It has no oracle to tell it what is true and what is false. But at least this kind of approach might make LLM-enhanced search engines less embarrassing.

negoutputeng 2 hours ago

well said. agree 100%. papers like these - and i did skim through it, are thinking "within the box" as follows: we have a system, and it has a problem, how do we fix the problem "within" the context of the system.

As you have put it well, there is no notion of truthfulness encoded in the system as it is built. hence there is no way to fix the problem.

An analogy here is around the development of human languages as a means of communication and as a means of encoding concepts. The only languages that humans have developed that encode truthfulness in a verifiable manner are mathematical in nature. what is needed may be along the lines of encoding concepts with a theorem prover built-in - so what comes out is always valid - but then that will sound like a robot lol, and only a limited subset of human experience can be encoded in this manner.

TeMPOraL an hour ago

If it was, maybe. But it wasn't.

Training data isn't random - it's real human writing. It's highly correlated with truth and correctness, because humans don't write for the sake of writing, but for practical reasons.

noman-land 2 hours ago

In reality, it's the "correct" responses that are the hallucinations, not the incorrect ones. Since the vast majority of the possible outputs of an LLM are "not true", when we see one that aligns with reality we hallucinate the LLM "getting it right".

sebzim4500 26 minutes ago

I've never been to Moscow personally. Am I then not being truthful when I tell you that Moscow is in Russia?

[-]

zeven7 17 minutes ago

There’s a decently well known one in Idaho

vidarh 2 hours ago

What you're saying at the start is equivalent to saying that a truth table is impossible.

moffkalast an hour ago

Remember, it's not lying if you believe it ;)

Training data is the source of ground truth, if you mess that up that's kind of a you problem, not the model's fault.

NoMoreNicksLeft an hour ago

> To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous.

A more interesting pursuit might be to determine if humans are "hallucinating" in this same way, if only occasionally. Have you ever known one of those pathological liars who lie constantly and about trivial or inconsequential details? Maybe the words they speak are coming straight out of some organic LLM-like faculty. We're all surrounded by p-zombies. All eight of us.

ForHackernews an hour ago

When I talk to philosophers on zoom my screen background is an exact replica of my actual background just so I can trick them into having a justified true belief that is not actually knowledge.

t. @abouelleill

[-]

kelseyfrog 26 minutes ago

Are LLMs Gettier machines? I'm confident saying yes and that hallucinations are a consequence of this.

cfcf14 2 hours ago

Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.

pessimizer 2 hours ago

I'm absolutely sure than LLMs have an internal representation of "truthfulness" because "truthfulness" is a token.

niam an hour ago

I feel that discussion over papers like these so-often distill to conversations about how it's "impossible for a bot to know what's true", that we should just bite the bullet and define what we mean by "truth".

Some arguments seem to tacitly hold LLMs to a standard of full-on brain-in-a-vat solipsism, asking them to prove their way out, where they'll obviously fail. The more interesting and practical questions, just like in humans, seem to be a bit removed from that though.

TZubiri an hour ago

Getting "we found the gene for cancer" vibes.

Such a reductionist view of the issue, the mere suggestion that hallucinations can be fixed by tweaking some variable or fixing some bug immediately discredits the resrarchers.

[-]

genrilz an hour ago

I'm not sure why you think hallucinations can't be "fixed". If we define hallucinations as falsehoods introduced between the training data and LLM output, then it seems obvious that the hallucination rate could at least be reduced significantly. Are you defining hallucinations as falsehood introduced at any point in the process?

Alternatively, are you saying that they can never be entirely fixed because LLMs are an approximate method? I'm in agreement here, but I don't think the researchers are claiming that they solved hallucinations completely.

Do you think LLMs don't have an internal model of the world? Many people seem to think that, but it is possible to find an internal model of the world in small LLMs trained on specific tasks (See [0] for a nice write-up of someone doing that with an LLM trained on Othello moves). Presumably larger general LLMs have various models inside of them too, but those would be more difficult to locate. That being said, I haven't been keeping up with the literature on LLM interpretation, so someone might have managed it by now.

[0] https://thegradient.pub/othello

[-]

tripper_27 38 minutes ago

> If we define hallucinations as falsehoods introduced between the training data and LLM output,

Yes, if.

Or we could realize that the LLMs output is a random draw from a distribution learned from the training data, i.e. ALL of its outputs are a hallucination. It has no concept of truth or falsehoods.

topspin 35 minutes ago

> the mere suggestion that hallucinations can be fixed by tweaking some variable or fixing some bug

That "suggestion" is fictional: they haven't suggested this. What they offer is a way to measure the confidence a particular model might have in the product of the model. Further, they point out that there is no universal function to obtain this metric: different models encode it in differently.

Not exactly a "cures cancer" level claim.

benocodes 3 hours ago

I think this article about the research is good, even though the headline seems a bit off: https://venturebeat.com/ai/study-finds-llms-can-identify-the...

jessfyi an hour ago

The conclusions reached in the paper and the headline differ significantly. Not sure why you took a line from the abstract when even further down it notes that it's that some elements of "truthfulness" are encoded and that "truth" as a concept is multifaceted. Further noted is that LLMs can encode the correct answer and consistently output the incorrect one, with strategies mentioned in the text to potentially reconcile the two, but as of yet no real concrete solution.

manmal an hour ago

That would be truthfulness to the training material, I guess. If you train on Reddit posts, it’s questionable how true the output really is.

Also, 100% truthfulness then is plagiarism?

[-]

amelius 33 minutes ago

> That would be truthfulness to the training material, I guess. If you train on Reddit posts, it’s questionable how true the output really is.

Maybe it learns to see when something is true, even if you don't feed it true statements all the time (?)

mdp2021 2 hours ago

Extremely promising, realizing that the worth is to be found the intermediates, containing much more than the single final output.

ldjkfkdsjnv an hour ago

There is a theory that AI will kill propaganda and false beliefs. At some point, you cannot force all models to have bias. Scientific and societal truths will be readily spoken by the machine god.

[-]

professor_v an hour ago

I'm extremely skeptical about this, I once believed the internet would do something similar and it seems to have done exactly the opposite.

[-]

topspin 29 minutes ago

Indeed. A model is only as good as its data. Propagandists have no difficulty grooming inputs. We have already seen high profile cases of this with machine learning.

z3c0 an hour ago

Could it be that language patterns themselves embed truthfulness, especially when that language is sourced from forums, wikis, etc? While I know plenty of examples exist to the contrary (propaganda, advertising, disinformation, etc), I don't think it's too optimistic to assert that most people engage in language in earnest, and thus, most language is an attempted conveyance of truth.