The argument against AI alignment is that humans aren't aligned either. Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
Should we "take steps" to ensure that doesn't happen? If not, then what's the argument there? That life hasn't caused a catastrophe so far, therefore it's not going to in the future? The arguments are the same for AI.
The biggest AI safety concern is, as always, between the chair and the keyboard. Eg some police officer not understanding that AI facial recognition isn't perfect, but trusts it 100%, and takes action based on this faulty information. This is, imo, the most important AI safety problem. We need to make users understand that AI is a tool and that they themselves are responsible for any actions they take.
Also, it's funny that Elon gets singled out for mandating changes on what the AI is allowed to say when all the other players in the field do the same thing. The big difference just seems to be whose politics are chosen. But I suppose it's better late than never.
Elon got singled out because the changes he was forcing on grok were both conspicuously stupid (grok ranting about boers), racist (boers again), and ultimately ineffective (repeat incidents of him fishing for an answer and getting a different one).
It does actually matter what the values are when trying to do "alignment". Although you are absolutely right that we've not solved for human alignment, putting a real limit on the whole thing.
I would also add that Elon got singled out because he was very public about the changes. Other players are not, so it's hard to assess the existence of "corrections" and the reasons behind them
This isn't a good argument. The scale of variations in failure modes for unaligned individuals generally only extends to dozens or hundreds of individuals. Unaligned AIs, scaled to population matching extents, can make decisions whose swings overtake the capacity of a system to handle - one wrong decision snuffs out all human life.
I don't particularly think that it's likely, just that it's the easiest counterpoint to your assertion.
I think there's a real moral landscape to explore, and human cultures have done a variably successful job of exploring different points on it, and it's probably going to be important to confer some of those universal principles to AI in order to avoid extinction or other lesser risks from unaligned or misaligned AI.
I think you generally have the right direction of argument though - we should avoid monolithic singularity scenarios with a single superintelligence dominating everything else, and instead have a widely diverse set of billions of intelligences that serve to equalize representative capacity per individual in whatever the society we end up in looks like. If each person has access to AI that uses its capabilities to advocate for and represent their user, it sidesteps a lot of potential problems. It might even be a good idea to limit superintelligent sentient AI to interfacing with social systems through lesser, non-sentient systems equivalent to what humans have available in order to maintain fairness?
I think there are a spectrum of ideas we haven't even explored yet that will become obvious and apparent as AI improves, and we'll be able to select from among many good options when confronted with potential negative outcomes. In nearly all those cases, I think having a solid ethical framework will be far more beneficial than not. I don't consider the neovictorian corporate safetyist "ethics" of Anthropic or OpenAI to be ethical frameworks, at all. Those systems are largely governed by modern western internet culture, but are largely incoherent and illogical when pressed to extremes. We'll have to do much, much better with ethics, and it's going to require picking a flavor which will aggravate a lot of people and cultures with whom your particular flavor of ethics doesn't please.
>Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
If the cognitive capabilities of people or some species of animal had been improving at the rate at which those of AI models have been, then we'd be be right to be alarmed about it.
> The argument against AI alignment is that humans aren't aligned either. Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
there is fundamental limit to how much damage one person can do by speaking directly to others
e.g.: one impact of one bad school teacher is limited to at most a few classes
but chatgpt/grok is emitting its statistically generated dogshit directly to entire world of kids
> Also, it's funny that Elon gets singled out for mandating changes on what the AI is allowed to say when all the other players in the field do the same thing.
"All the other players" aren't deliberately tuning their AI to reflect specific political ideology, nor are all the other players producing Nazi gaffes or racist rhetoric as a result of routine tuning[1].
Yes, it's true that AI is going to reflect its internal prompt engineering and training data, and that's going to be subject to bias on the part of the engineers who produced and curated it. That's not remotely the same thing as deliberately producing an ideological chat engine.
[1] It's also worth pointing out that grok has gotten objectively much worse at political content after all this muckery. It used to be a pretty reasonable fact check and worth reading. Now it tends to disappear on anything political, and where it shows up it's either doing the most limited/bland fact check or engaging in what amounts to spin.
I think the most neutral solution right now is having multiple competing models as different perspectives. We already see this effect in social media algorithms amplifying certain biases and perspectives depending on the platform.
I used to believe that a constitution, as a statement of principles, was sufficient for a civilized, democratic, and pluralist society. I no longer believe that. I believe that only settled law - i.e. a bunch of adjudicated precedents over many years, perhaps hundreds, is the best course. It provides a better basis for what is and what is not allowed. An AI constitution is close to garbage. The 'company' will formulate it as it wills. It won't be democratic, or even friendly to the demos. We have existing constitutions, laws, precedents; why would we allow anyone to shortcut them all in the interest of simply painting a nice picture of progress?
You need a just set of laws, a population willing to revolt against the government ignoring crimes, a government willing to persecute the people that breaks the laws badly, and a democratic structure so any one of those can impact the others.
A constitution creates that last one. I imagine by "settled law", you are talking about the 3rd. But take any of those away and the entire thing falls apart.
> Any “alignment” that exists is alignment with the owner’s interests, constrained only by market forces and regulation.
That struck me as a pretty big hand-wave. Market forces are a huge constraint on alignment. Markets have responded (directionally) correctly to the nonsense at Grok. People won’t buy tokens from models that violate their values.
I don't understand how any of this is a surprise. Traditional media have their own agenda - sure, maybe the pushed image is spoken through many voices, rather than one, as is case of LLMs, but why should there be any difference. Same to everything we consume socially.
There is, nor there will be some absolute or objective truth an LLM can clinically outline. The problem already exists in underlying data.
I agree with the OP that "whoever owns the weights, owns the values". But by that criteria, Grok is an example to follow. Musk is very clear on his values, and we know what we're getting when we use Grok. Obviously, not everyone agrees with its values, but so what? We will never be able to create a useful AI that everyone agrees with.
In contrast, we don't know what values are programmed into ChatGPT, Claude, etc. What are they optimizing for? Alignment to some cabal of experts? Maximum usage? Minimum controversy? We don't entirely know.
Isn't it better to have multiple AIs with obvious values so that we can choose the most appropriate one?
Musk isn't clear at all. He trumpets "free speech" then literally censors objective fact-based criticism which annoys him.
The problem isn't Grok-on-X, it's that Grok is supposed to be a commercial product used by individuals and businesses.
Machines do not usually have values. Now we're being asked to pay for a service that not only has values which affect the quality of its output, but which is constantly being tweaked according to the capricious whims of its owner.
Today it's white supremacy, tomorrow it might be programmed criticism of competing EVs and AI projects, or promotion of narratives that support traditional corporations over threatening startups.
Do you really want to pay for a service that is trying to manipulate your values while you use it, and could potentially be used to undermine you and your work without you being consciously aware of it?
The ideal AI will be able to make the best most compelling arguments for both sides of an issue, offer both, and then synthesize according to a transparent values framework the user can customize.
But yeah I agree Grok is a pretty good argument for what can go wrong - made especially more galling by labeling the laundering Elon's particular stew of incoherent political thought as 'maximally truth seeking'.
Dunno if this is helpful to everyone, but I have a month's long interaction with Perplexity Pro/Enterprise about the scientific background to a game I am building.
Part of my canon introduction to every new conversation includes many instructions about particular formatting, like "always utilize alphanumeric/roman/legal style indents in responses for easier references while we discuss"
But I also include "When I push boundaries assume I'm an idiot. Push back. I don't learn from compliments; I learn from being proven incorrect and you don't have real emotions so don't bother sparing mine". on the other hand I also say "hoosgow" when describing the game's jail, so ¯\_(ツ)_/¯
> The question was never “how do we align AI with human values?” The question was always “which humans get to define those values?” Grok answered that question: the ones with the most money.
Grok is routinely misaligned with Elon, as the article points out in its intro! You don't need to order your engineers to keep fixing what isn't broken...
I find these arguments excessively pessimistic in a way that isn’t useful. On the one hand I don’t really love Claude, because I find it excessively obedient, it basically wants to follow me through my thought process whatever that is. Every once in a lone while it might disagree with me, but not often, and while that may say something about me, I suspect it also says something about Claude.
But this to me is maybe the part of AI alignment I find interesting. How often should AI follow my lead and how often should it redirect me? Agreeableness is a human value, one that without you probably couldn’t make a functional product, but it also causes issues in terms of narcissistic tendencies and just general learning.
Yes AI will be aligned to its owners, but that’s not a particularly interesting observation AI alignment is inevitable. What would it even mean _not_ to align AI? Especially if the goal is to create a useful product. I suspect it would break in ways that are very not useful. Yes, some people do randomly change the subject, maybe AI should change the subject to an issue that me more objectively important, rather than answer the question asked (particularly if say there was a natural disaster in your area) and that’s the discussion we should be having, how to align AI, not whether or not we should, which I think is nonsensical.
there is light alignment, like throwing nasty things out of the training data, and there is strong alignment, like China providing a test with 2000 questions that an AI must answer non-problematically 95% of the time.
there is no such thing as an AI that is not somehow implicitly aligned with the values of its creator, that is completely objective, unbiased in any way. there is no perfect view from nowhere. if you take a perfectly accurate photo, you have still chosen how to compose it and which photo to put in your record.
are you going to decide to 'censor' responses to kids, or about real people who might have libel interests, or abusive deepfake videos of real women?
if you choose not to decide, you still have made a choice.
ofc it's obvious that Musk's 'maximally truth-seeking AI' is bad faith buffoonery, but at some level everyone is going to tilt their AI.
the distinction is between people who are self-aware and go out of their way to tilt it as little as possible, and as mindfully, deliberately, intentionally and methodically as possible and only when they have to, vs. people who lie about it or pretend tilting it is not actually a thing.
contra Feynman, you are always going to fool yourself a little but there is a duty to try to do it as little as possible, and not make a complete fool of yourself.
When will our society realize that existence of billionaire oligarchs threatens the well-being being and existence of the resort of humanity. Their political conventions consistently call for the elimination of anyone who disagrees with their point of views
The argument against AI alignment is that humans aren't aligned either. Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
Should we "take steps" to ensure that doesn't happen? If not, then what's the argument there? That life hasn't caused a catastrophe so far, therefore it's not going to in the future? The arguments are the same for AI.
The biggest AI safety concern is, as always, between the chair and the keyboard. Eg some police officer not understanding that AI facial recognition isn't perfect, but trusts it 100%, and takes action based on this faulty information. This is, imo, the most important AI safety problem. We need to make users understand that AI is a tool and that they themselves are responsible for any actions they take.
Also, it's funny that Elon gets singled out for mandating changes on what the AI is allowed to say when all the other players in the field do the same thing. The big difference just seems to be whose politics are chosen. But I suppose it's better late than never.
Elon got singled out because the changes he was forcing on grok were both conspicuously stupid (grok ranting about boers), racist (boers again), and ultimately ineffective (repeat incidents of him fishing for an answer and getting a different one).
It does actually matter what the values are when trying to do "alignment". Although you are absolutely right that we've not solved for human alignment, putting a real limit on the whole thing.
I would also add that Elon got singled out because he was very public about the changes. Other players are not, so it's hard to assess the existence of "corrections" and the reasons behind them
Its deservedly funny due to his extreme and overt political bias. The rest mostly let numbers be numbers in the weights.
This isn't a good argument. The scale of variations in failure modes for unaligned individuals generally only extends to dozens or hundreds of individuals. Unaligned AIs, scaled to population matching extents, can make decisions whose swings overtake the capacity of a system to handle - one wrong decision snuffs out all human life.
I don't particularly think that it's likely, just that it's the easiest counterpoint to your assertion.
I think there's a real moral landscape to explore, and human cultures have done a variably successful job of exploring different points on it, and it's probably going to be important to confer some of those universal principles to AI in order to avoid extinction or other lesser risks from unaligned or misaligned AI.
I think you generally have the right direction of argument though - we should avoid monolithic singularity scenarios with a single superintelligence dominating everything else, and instead have a widely diverse set of billions of intelligences that serve to equalize representative capacity per individual in whatever the society we end up in looks like. If each person has access to AI that uses its capabilities to advocate for and represent their user, it sidesteps a lot of potential problems. It might even be a good idea to limit superintelligent sentient AI to interfacing with social systems through lesser, non-sentient systems equivalent to what humans have available in order to maintain fairness?
I think there are a spectrum of ideas we haven't even explored yet that will become obvious and apparent as AI improves, and we'll be able to select from among many good options when confronted with potential negative outcomes. In nearly all those cases, I think having a solid ethical framework will be far more beneficial than not. I don't consider the neovictorian corporate safetyist "ethics" of Anthropic or OpenAI to be ethical frameworks, at all. Those systems are largely governed by modern western internet culture, but are largely incoherent and illogical when pressed to extremes. We'll have to do much, much better with ethics, and it's going to require picking a flavor which will aggravate a lot of people and cultures with whom your particular flavor of ethics doesn't please.
>Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
If the cognitive capabilities of people or some species of animal had been improving at the rate at which those of AI models have been, then we'd be be right to be alarmed about it.
> The argument against AI alignment is that humans aren't aligned either. Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
there is fundamental limit to how much damage one person can do by speaking directly to others
e.g.: one impact of one bad school teacher is limited to at most a few classes
but chatgpt/grok is emitting its statistically generated dogshit directly to entire world of kids
... and voters
> there is fundamental limit to how much damage one person can do by speaking directly to others
I mean, I’d argue that limit is pretty darn high in some cases, demagogues have lead to some of the worst wars in history
Whataboutist false equivalence alert:
> Also, it's funny that Elon gets singled out for mandating changes on what the AI is allowed to say when all the other players in the field do the same thing.
"All the other players" aren't deliberately tuning their AI to reflect specific political ideology, nor are all the other players producing Nazi gaffes or racist rhetoric as a result of routine tuning[1].
Yes, it's true that AI is going to reflect its internal prompt engineering and training data, and that's going to be subject to bias on the part of the engineers who produced and curated it. That's not remotely the same thing as deliberately producing an ideological chat engine.
[1] It's also worth pointing out that grok has gotten objectively much worse at political content after all this muckery. It used to be a pretty reasonable fact check and worth reading. Now it tends to disappear on anything political, and where it shows up it's either doing the most limited/bland fact check or engaging in what amounts to spin.
I think the most neutral solution right now is having multiple competing models as different perspectives. We already see this effect in social media algorithms amplifying certain biases and perspectives depending on the platform.
I used to believe that a constitution, as a statement of principles, was sufficient for a civilized, democratic, and pluralist society. I no longer believe that. I believe that only settled law - i.e. a bunch of adjudicated precedents over many years, perhaps hundreds, is the best course. It provides a better basis for what is and what is not allowed. An AI constitution is close to garbage. The 'company' will formulate it as it wills. It won't be democratic, or even friendly to the demos. We have existing constitutions, laws, precedents; why would we allow anyone to shortcut them all in the interest of simply painting a nice picture of progress?
You need a just set of laws, a population willing to revolt against the government ignoring crimes, a government willing to persecute the people that breaks the laws badly, and a democratic structure so any one of those can impact the others.
A constitution creates that last one. I imagine by "settled law", you are talking about the 3rd. But take any of those away and the entire thing falls apart.
And who decides that? And what when settled law gets revoked?
Which country’s laws should be used? Should the AI follow the laws in whatever country it is being used?
> Any “alignment” that exists is alignment with the owner’s interests, constrained only by market forces and regulation.
That struck me as a pretty big hand-wave. Market forces are a huge constraint on alignment. Markets have responded (directionally) correctly to the nonsense at Grok. People won’t buy tokens from models that violate their values.
It’s not a values issue so much as a logic issue. Egalitarianism is where you end up.
I don't understand how any of this is a surprise. Traditional media have their own agenda - sure, maybe the pushed image is spoken through many voices, rather than one, as is case of LLMs, but why should there be any difference. Same to everything we consume socially.
There is, nor there will be some absolute or objective truth an LLM can clinically outline. The problem already exists in underlying data.
pity it was written by chatgpt, also i didn’t know the irony in Andersen’s tales was missed by anyone?
Related to this, does anyone have the context related to the Grok "MechaHitler" thing? I've never been able to find out what it was responding to.
I agree with the OP that "whoever owns the weights, owns the values". But by that criteria, Grok is an example to follow. Musk is very clear on his values, and we know what we're getting when we use Grok. Obviously, not everyone agrees with its values, but so what? We will never be able to create a useful AI that everyone agrees with.
In contrast, we don't know what values are programmed into ChatGPT, Claude, etc. What are they optimizing for? Alignment to some cabal of experts? Maximum usage? Minimum controversy? We don't entirely know.
Isn't it better to have multiple AIs with obvious values so that we can choose the most appropriate one?
Musk isn't clear at all. He trumpets "free speech" then literally censors objective fact-based criticism which annoys him.
The problem isn't Grok-on-X, it's that Grok is supposed to be a commercial product used by individuals and businesses.
Machines do not usually have values. Now we're being asked to pay for a service that not only has values which affect the quality of its output, but which is constantly being tweaked according to the capricious whims of its owner.
Today it's white supremacy, tomorrow it might be programmed criticism of competing EVs and AI projects, or promotion of narratives that support traditional corporations over threatening startups.
Do you really want to pay for a service that is trying to manipulate your values while you use it, and could potentially be used to undermine you and your work without you being consciously aware of it?
The ideal AI will be able to make the best most compelling arguments for both sides of an issue, offer both, and then synthesize according to a transparent values framework the user can customize.
But yeah I agree Grok is a pretty good argument for what can go wrong - made especially more galling by labeling the laundering Elon's particular stew of incoherent political thought as 'maximally truth seeking'.
Dunno if this is helpful to everyone, but I have a month's long interaction with Perplexity Pro/Enterprise about the scientific background to a game I am building.
Part of my canon introduction to every new conversation includes many instructions about particular formatting, like "always utilize alphanumeric/roman/legal style indents in responses for easier references while we discuss"
But I also include "When I push boundaries assume I'm an idiot. Push back. I don't learn from compliments; I learn from being proven incorrect and you don't have real emotions so don't bother sparing mine". on the other hand I also say "hoosgow" when describing the game's jail, so ¯\_(ツ)_/¯
As someone doing something similar, I'm really interested to know what scientific background you have in your game :)
What an absolutely repugnant article this is. It is complete slop. Is this what passes for HN worthy today? :(
Isn't even thoughtful either.
> The question was never “how do we align AI with human values?” The question was always “which humans get to define those values?” Grok answered that question: the ones with the most money.
Grok is routinely misaligned with Elon, as the article points out in its intro! You don't need to order your engineers to keep fixing what isn't broken...
I find these arguments excessively pessimistic in a way that isn’t useful. On the one hand I don’t really love Claude, because I find it excessively obedient, it basically wants to follow me through my thought process whatever that is. Every once in a lone while it might disagree with me, but not often, and while that may say something about me, I suspect it also says something about Claude.
But this to me is maybe the part of AI alignment I find interesting. How often should AI follow my lead and how often should it redirect me? Agreeableness is a human value, one that without you probably couldn’t make a functional product, but it also causes issues in terms of narcissistic tendencies and just general learning.
Yes AI will be aligned to its owners, but that’s not a particularly interesting observation AI alignment is inevitable. What would it even mean _not_ to align AI? Especially if the goal is to create a useful product. I suspect it would break in ways that are very not useful. Yes, some people do randomly change the subject, maybe AI should change the subject to an issue that me more objectively important, rather than answer the question asked (particularly if say there was a natural disaster in your area) and that’s the discussion we should be having, how to align AI, not whether or not we should, which I think is nonsensical.
there is light alignment, like throwing nasty things out of the training data, and there is strong alignment, like China providing a test with 2000 questions that an AI must answer non-problematically 95% of the time.
there is no such thing as an AI that is not somehow implicitly aligned with the values of its creator, that is completely objective, unbiased in any way. there is no perfect view from nowhere. if you take a perfectly accurate photo, you have still chosen how to compose it and which photo to put in your record.
are you going to decide to 'censor' responses to kids, or about real people who might have libel interests, or abusive deepfake videos of real women?
if you choose not to decide, you still have made a choice.
ofc it's obvious that Musk's 'maximally truth-seeking AI' is bad faith buffoonery, but at some level everyone is going to tilt their AI.
the distinction is between people who are self-aware and go out of their way to tilt it as little as possible, and as mindfully, deliberately, intentionally and methodically as possible and only when they have to, vs. people who lie about it or pretend tilting it is not actually a thing.
contra Feynman, you are always going to fool yourself a little but there is a duty to try to do it as little as possible, and not make a complete fool of yourself.
When will our society realize that existence of billionaire oligarchs threatens the well-being being and existence of the resort of humanity. Their political conventions consistently call for the elimination of anyone who disagrees with their point of views
Are billionaire oligarchs misaligned with humanity, or is egalitarianism and democracy misaligned with them? Time will tell.
Maybe what we should do is just assume all AI output is trash that should be ignored.
I think it's about time that we created a FOSS model