Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

(arxiv.org)

3 points | by palmerschallon 7 hours ago ago

2 comments

Found that ALiBi positional encoding causes 31-44% of attention heads in BLOOM-family models to collapse — attending almost entirely to token 0 rather than meaningful context. The paper identifies the pathology and a targeted repair. Happy to answer questions.

_wire_ 3 hours ago

Interesting that lack of engineering discipline in LLM design has industry vernacular veering into medically pathologized forensics and remediation.

From a C.A.R. Hoare perspective, this looks like a horrible development: Human bias in training of LLMs produces effects that convince credulous users that automata are alive, credulous users think that as we don't understand why living organisms do what they do, then as an LLM seems like a living organism there's no expectation of understanding why LLMs do what they do...

We put our own ghost in a machine, confuse the machine with our ghost, then rely on dark arts to cope with our lack of understanding of the machine.

So expect LLMs to be further mystified, treated as specimens for study and symptoms to cure, then categorized behaviorally by medically appropriated syndromes.

We can already see the burgeoning new high priest career-track Doctor of AI, with all the attendant quackery, patent cures, leaching and horse-whispering.