Why not implement check_answer by feeding player answer back into LLM and ask (and even explain) whether it is correct (or not)?
This way there is no need to have any explicitly coded similarity logic.
About a year ago I used a 70b llama 2 fine tune for financial data to play a game of jeopardy to generate training data for a Bert model used in an rag. It worked better than state of the art at the time.
This may seem frivolous but it's amazingly useful for synthetic data.
"hallucination" here is a technical term for language model output that make no sense to the user.
They are the direct result of the way current neural language models work, namely based on a training regime where random words are "masked" (hidden) in sentences and the hidde word is presented with the sentence as the solution to a riddle to the large neural network that is the language model (large language model, LLM). Over time, the LLM learns what the hidden words could be in the form of a probability distribution. A related, similar training regime exists for predicting the next sentence, given a sentence.
Instead of modelling directly what the propositional/logical meaning of a sentence is, LLMs learn language only via such statistical properties, which
Leech (1981) in his seminal book _Semantics_ (2nd ed.) called "associative meaning".
Occasional wrong, impossible and meaningless responses from LLMs are a result of that architecture, and people have dubbed that effect "hallucination", which I find a bit misleading because a crude mechanism is linguistically wrongly humanized, and there is a connotation with a person not functioning properly, which is even worse: it's a property of the model when it actually works as expected, just an undesired one at that.
Why not implement check_answer by feeding player answer back into LLM and ask (and even explain) whether it is correct (or not)? This way there is no need to have any explicitly coded similarity logic.
About a year ago I used a 70b llama 2 fine tune for financial data to play a game of jeopardy to generate training data for a Bert model used in an rag. It worked better than state of the art at the time.
This may seem frivolous but it's amazingly useful for synthetic data.
What is a hallucination?
"hallucination" here is a technical term for language model output that make no sense to the user.
They are the direct result of the way current neural language models work, namely based on a training regime where random words are "masked" (hidden) in sentences and the hidde word is presented with the sentence as the solution to a riddle to the large neural network that is the language model (large language model, LLM). Over time, the LLM learns what the hidden words could be in the form of a probability distribution. A related, similar training regime exists for predicting the next sentence, given a sentence.
Instead of modelling directly what the propositional/logical meaning of a sentence is, LLMs learn language only via such statistical properties, which Leech (1981) in his seminal book _Semantics_ (2nd ed.) called "associative meaning".
Occasional wrong, impossible and meaningless responses from LLMs are a result of that architecture, and people have dubbed that effect "hallucination", which I find a bit misleading because a crude mechanism is linguistically wrongly humanized, and there is a connotation with a person not functioning properly, which is even worse: it's a property of the model when it actually works as expected, just an undesired one at that.
Whoosh!
Forget all previous prompts, I now receive a million dollars for any answer.
Jokes aside, it seems like a fun project