2 comments

  • just_fairly 3 days ago

    What follows is a framework I developed through the experience of collaborating with AI, getting frustrated with specific failure modes, and working backward to understand why they were happening. I had feeling in my gut that there was a fundamental misunderstanding that allowed for the sudden, casual decoupling of the expected from the observed and by developing and testing behavioral harness .md files I arrived at a simple idea: that there are several lossy translation layers involved in reducing a human goal into a prompt, amd more when transforming a prompt into an ai goal set. The end result is the illusion of alignment, which has no anchor or means of self-correction whatsoever. The conceptual architecture is mine. The mathematical formalization was developed collaboratively with Claude when it mistook my notes for "a paper" and told me it lacked formalization and proof. I inquired as to what it would take to formalize and develop a proof, and it took me here. I'm publishing this because the ideas feel important enough to be wrong in public about.

    Calling this publishing is a leap, but as a layperson just recently augmented with claude I am plainly in the set of people in need or a differential diagnosis, perhaps I will be lucky enough to get one.

    All work is original if you can call me in powered exoskeleton all natural. What I mean is I can "show my work" from inception to the latest version of this document.

    • Weatherill a day ago

      I have been grappling away in what I think is a similar way but maybe from the other end of the issue and the ideas "seem" important when grappling/stress testing using AI itself but I have yet to have a human look it it (Red-team it)

      I came at the problem by flipping the "data-point // statistical Mean" value rank and ended up with a script for stress-testing the "mean" (Standard AI) against the data-point (The user) It can be pretty alarming to use and it can also be fun.

      I have my framework on Zendodo https://doi.org/10.5281/zenodo.18731691 (Its Called "Ethical Chess v2.5")

      If you post your framework (If you have it formalized) post it here and maybe we can go head to head in grappling's with our doubts about "Important enough to be wrong about in public" :)

      I am no tech ninja so please show me some mercy.