<< The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. During specific conditions, this could allow users to authenticate by only providing the username with the stored cache key of a previous successful authentication. >>
So if the userid is 18 digits, the username is 52 characters, and the delimiters are 1 character each, then the total length of the non-secret prefix is 72, and bcrypt will drop the secret suffix.
You aren’t supposed to put more than the salt and the password into trad unix password hashes.
* They concatenated userId + username + password for a cache key
* Used BCrypt (which has a 72-byte limit)
* The concatenation could exceed 72 bytes, causing the password portion to be truncated
Why this is problematic:
* BCrypt is designed for password hashing, not cache key generation
* Mixing identifiers (userId, username) with secrets (password) in the same hash
* Truncation risk due to BCrypt's limits
Password storage should be separate from cache key generation. Use a random salt + appropriate hash function and for cache keys - use HMAC or KDF w/appropriate inputs
That should also mean that ca 50-52 character usernames are likely easily bruteforcable. Which makes the preconditions wider than those stated in the publication.
They’re saying that if a username is 50 characters, only 2 characters of the password are used to in the cache key. And a 2 character password is very bruteforceable.
Potentially ignorant question, why would they go for bcrypt over say HKDF[1], especially since they mix in public data like the username and potentially userid?
Can't believe the answers you're getting. The answer's a big fat NO. If you find yourself in that situation, there's something very incorrect with your design.
With no password in key: mildly cleaner to drop entries on password change, even if the cache didn't get the command to drop the key, the next login would override the old key's value anyhow, instead of potentially a key per password that was valid in the short period around a password change
Of course, if you have any validness of old sessions / passwords around a password change, you are doing something wrong.
My personal wondering is, considering KDF is meant to be expensive, why is IO more expensive to the point it needs a cache?
Because bcrypt is still viable. Its cost factor is easily scaled to commodity performance, keeping the attack cost high.
The main attack vector these days is GPU-based compute. There, SHA* algorithms are particularly weak because they can be so efficiently computed. Unlike SHA algorithms, bcrypt generates high memory contention, even on modern GPUs.
Add in the constraint of broad support, low "honest use" cost, and maturity (extensive hostile cryptanalysis), bcrypt stays as one of the better choices even 25 year later.
That said, bcrypt's main limitation is it has a low memory-size cost. There are some newer algorithms that improve on bcrypt by increasing the memory-size cost to more than is practical even for FPGA attacks.
More importantly, bcrypt didn't actually fail here. The vulnerability happened because okta didn't use it correctly. All crypto is insecure if you use it wrong enough.
after all of the headaches in the late 90s/early 2000s with truncating password hash functions, i'm just a little surprised that this sort of thing would still be an issue.
i understand performance concerns and design trade offs, but i would expect a secure hashing function in 2024 to do proper message scheduling and compression or return errors when truncations are happening.
i suppose 90s culture is hip again these days, so maybe this does make sense?
Bcrypt is still perfectly usable for its original purpose. They just picked/wrote a bad implementation that silently truncated inputs longer than the maximum input length. Would you also ask why they picked AES (a cipher from 1998) when the error was with the user (e.g. picking fixed/too short key)?
The goal of a salt is to prevent lookup attacks. Since the user id is unique to each user, it prevents the use of pre-computed lookup tables like a salt would.
The security of salting is twofold. Yes, it defeats the common rainbow table. But if the salt is known, a rainbow table for that salt can be computed. The security of salting depends on the salt being unknown.
If the salt is externally known, which the username and userID necessarily are, then the rainbow table for that account can be computed entirely offline, defeating the point of salting.
You're both right, but are coming at this from different directions. In the past a rainbow table was intended to reveal as many passwords on a system as possible once you got a copy of the passwords. If one of them happened to be a high-value account, great. But maybe access to an ordinary account is good enough for their (nefarious) purposes.
It's also possible to build a rainbow table when you already know an account is high-value and have the salt. You can't go download that rainbow table - you'll have to compute it yourself, so the cost to the attacker is higher. But if the account is valuable enough to justify targeting specifically, you'll do it.
The primary purpose of salting is to prevent precomputation being used to attack all users (e.g. rainbow tables). Even when specific salts are known they have already done this job.
Salts are not intended to be secrets.
If you want to treat a salt as if it was a private key, that would only provide additional protection for the very specific circumstance where the user hash is compromised, but the corresponding salt was not.
> then the rainbow table for that account can be computed entirely offline
So you basically bruteforce the password for a specific account before you get the actual hash but after you know the hashing scheme? I don’t see how this helps with any sort of attack though.
No, rainbow tables are hash-input specific. They're user-specific only if the salt is user-unique. Usernames aren't normally part of the hash input because they're assumed-public knowledge.
You can test this for yourself by creating a user account, then editing the master password database and manually changing the username without recalculating the password hash. The password will still work. If the username was part of the hash input, the password would fail.
You complained about them salting using the public username as salt. You asserted that this makes the rainbow table computable offline. I asserted that this didn't matter much for security since the table for H(username || secret) is username specific and not reusable for other usernames. Since precomputed rainbow tables consume quite a bit of space it is rather unlikely that anyone would have such a table stored for any random username.
Though you can salt a hash using a function that does not take a distinct salt input by just concatenating the salt with the value. This is a relatively common practice, but of course only works if there is no truncation of the salted input.
A delimiter fixes the problem if you're sure the delimiter character can never be inside the username and password. Better would be to prefix the length of each field. Better still would be separately hashing each field and concatenating the results.
Personally, I like a fixed uint8 or uint16 representing the length of each segment. Then, there are no forbidden characters or quoting required. Maybe I want to have \0 in my password.
You still need to make sure nulls can't show up, and you need to consider possible truncation scenarios caused by those nulls and make sure they won't cause silent failures at any point.
One pattern I bump up against from time to time is the delta between using a perfectly defensible technique for a given use-case (safe delimiters when constructing an input for a specific function) versus a desire to have each decision be driven by some universal law (e.g. "if you're streaming data between services, using null bytes as delimiters might not be safe if consuming services may truncate at null bytes, so NEVER use null bytes as delimiters because they can be unsafe")
It's not even a matter of one "side" being right or wrong. You can simultaneously be right that this is perfectly safe in this use-case, while someone else can be right to be concerned ("need to consider possible") because the code will forever be one refactor or copy/paste away from this concatenated string being re-used somewhere else.
Also, we're talking about user_id not user_email, so it should be the same length always. Well, unless you're silly and using databases sequence for IDs.
This is obviously a huge mistake by Okta (for the love of God understand how your crypto functions work before you apply them) but at the same time, a crypto function with a maximum input length that also auto-truncates the data sounds like bad API design. You are basically asking for someone to goof up and make a mistake. It's much better to implement these things defensively so that the caller doesn't inadvertently make a mistake. Especially with a hashing algorithm, because there is no way to verify that the result is correct.
Agree with the spirit of the argument, but I disagree about the bad design. BCrypt has its trade-offs, you are expected to know how to use it when using it, specially if by choice.
It's like complaining about how dangerous an axe is because it's super sharp. You don't complain, you just don't grab the blade section, you grab it by the handle. And
If passing more than 72 bytes to a function makes it silently fail, it IS bad design, especially for a sensitive, security-related function. The first condition in the function should be `if len(input) > 72 then explicitly fail`
Not letting people use your API incorrectly is API design 101.
To be clear this is not the fault of the bcrypt algorithm, all algorithms have their limitations. This is the fault of bcrypt libs everywhere, when they implement bcrypt, they should add this check, and maybe offer an unsafe_ alternative without the check.
There is no other answer than this. Silent failures are never acceptable, even if documented. Because despite what we want to believe about the world, people don’t read the docs, or read them and forget, or read them and misunderstand.
If your crypto library works like an Axe and the methods aren’t prefixed with “unsafe_”, the library is bad. I would expect an exception when a hashing function gets an argument that’s too long, not just dropping of the excess input. Who thinks that’s the best choice?
Even that sounds potentially dangerous to me now, since it effectively means that some extra-long "correct horse battery stapler"-style passwords could be left effectively unsalted. I mean yeah, 78 chars is an awfully long password but for some famous book or movie quotes maybe not outside the realm of possibility. Or if languages using double-byte characters effectively halve that cutoff then it really becomes an issue...
I think it's more of a logic problem. I suspect the engineers made a false assumption that bcrypt can hash a trivial amount of data like some other hashing algos.
They had literally 1 job: secure authentication. This isn't the first time Okta has had a gaffe on a level that should cause any customer to reconsider. What's that saying, "Fool me once, shame on thee, fool me twice, shame on me". Don't get fooled by Okta a second, third, or fourth time.
Also, are there any repercussions for this kind of stuff? I don't know, fines from the organizations they get compliance certifications from or something.
Those compliance companies are (mostly) all just checking a box. It's (mostly) security theater from people who wouldn't know security if it bit them in the nether regions.
Even if that wasn't true, there's probably no box in any compliance regime that says "Yes, we loudly promulgate our security failures from the nearest rooftop on 10am on a weekday" (and it's always five o'clock somewhere, right?)
If it helps (I know it doesn't), the Executive Branch likes to do this with poor job number revisions, too, lol
<< The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. During specific conditions, this could allow users to authenticate by only providing the username with the stored cache key of a previous successful authentication. >>
https://man.openbsd.org/crypt
<< The maximum password length is 72. >>
So if the userid is 18 digits, the username is 52 characters, and the delimiters are 1 character each, then the total length of the non-secret prefix is 72, and bcrypt will drop the secret suffix.
You aren’t supposed to put more than the salt and the password into trad unix password hashes.
Here's how I see it:
Core issue (okta's approach):
Why this is problematic: Password storage should be separate from cache key generation. Use a random salt + appropriate hash function and for cache keys - use HMAC or KDF w/appropriate inputsThat should also mean that ca 50-52 character usernames are likely easily bruteforcable. Which makes the preconditions wider than those stated in the publication.
50 letters is 235 bits which is not at all bruteforceable.
They’re saying that if a username is 50 characters, only 2 characters of the password are used to in the cache key. And a 2 character password is very bruteforceable.
Potentially ignorant question, why would they go for bcrypt over say HKDF[1], especially since they mix in public data like the username and potentially userid?
[1]: https://datatracker.ietf.org/doc/html/rfc5869
Why do we need a KDF for a cache key? Won't a normal cryptographic hash function (or its HMAC variant) suffice?
If the cache gets leaked, you don’t want any miscreants to be able to bruteforce passwords from the cache keys.
Do we need to put the password in the cache key?
Can't believe the answers you're getting. The answer's a big fat NO. If you find yourself in that situation, there's something very incorrect with your design.
So how would you design it instead?
Some insight into why this is good and why including the password as input in the derivation of the cache key is terrible would be appreciated.
With no password in key: mildly cleaner to drop entries on password change, even if the cache didn't get the command to drop the key, the next login would override the old key's value anyhow, instead of potentially a key per password that was valid in the short period around a password change
Of course, if you have any validness of old sessions / passwords around a password change, you are doing something wrong.
My personal wondering is, considering KDF is meant to be expensive, why is IO more expensive to the point it needs a cache?
Thanks, good points.
> why is IO more expensive to the point it needs a cache
The advisory mentions it's only exploitable if the upstream auth server is unresponsive. So it seems to be mainly for resilience.
If you want to validate a username/password authn attempt against a cache, then yes the username and password have to be someone in the mix.
If the user changes password it invalidates the cache entries automatically, so you avoid stale credentials exploiting the cache
At least that's my immediate thought, could be wrong.
Isn't the whole point of using bcrypt is that you can't bruteforce the password?
why would someone in 2024 reach for bcrypt for building a secure hash key?
Because bcrypt is still viable. Its cost factor is easily scaled to commodity performance, keeping the attack cost high.
The main attack vector these days is GPU-based compute. There, SHA* algorithms are particularly weak because they can be so efficiently computed. Unlike SHA algorithms, bcrypt generates high memory contention, even on modern GPUs.
Add in the constraint of broad support, low "honest use" cost, and maturity (extensive hostile cryptanalysis), bcrypt stays as one of the better choices even 25 year later.
That said, bcrypt's main limitation is it has a low memory-size cost. There are some newer algorithms that improve on bcrypt by increasing the memory-size cost to more than is practical even for FPGA attacks.
More importantly, bcrypt didn't actually fail here. The vulnerability happened because okta didn't use it correctly. All crypto is insecure if you use it wrong enough.
after all of the headaches in the late 90s/early 2000s with truncating password hash functions, i'm just a little surprised that this sort of thing would still be an issue.
i understand performance concerns and design trade offs, but i would expect a secure hashing function in 2024 to do proper message scheduling and compression or return errors when truncations are happening.
i suppose 90s culture is hip again these days, so maybe this does make sense?
It seemed the best option I could find for a rp2040 microcontroller when I went looking recently? Perhaps not for okta...
okta has been around longer than a year and momentum keeps a lot of companies from changing anything until catastrophe strikes
per <https://trust.okta.com/security-advisories/okta-ad-ldap-dele...>
2024-07-23 - Vulnerability introduced as part of a standard Okta release
This issue is not an "okta is old" issue. this was new code written in 2024 that used a password hashing function from 1999 as a cache key.
Bcrypt is still perfectly usable for its original purpose. They just picked/wrote a bad implementation that silently truncated inputs longer than the maximum input length. Would you also ask why they picked AES (a cipher from 1998) when the error was with the user (e.g. picking fixed/too short key)?
> You aren’t supposed to put more than the salt and the password into trad unix password hashes.
To be fair, they're basically salting with the userid and username. Still unorthodox to be sure.
The salt is a separate input to the algorithm that is used differently and usually more restricted than the password.
The goal of a salt is to prevent lookup attacks. Since the user id is unique to each user, it prevents the use of pre-computed lookup tables like a salt would.
The security of salting is twofold. Yes, it defeats the common rainbow table. But if the salt is known, a rainbow table for that salt can be computed. The security of salting depends on the salt being unknown.
If the salt is externally known, which the username and userID necessarily are, then the rainbow table for that account can be computed entirely offline, defeating the point of salting.
You're both right, but are coming at this from different directions. In the past a rainbow table was intended to reveal as many passwords on a system as possible once you got a copy of the passwords. If one of them happened to be a high-value account, great. But maybe access to an ordinary account is good enough for their (nefarious) purposes.
It's also possible to build a rainbow table when you already know an account is high-value and have the salt. You can't go download that rainbow table - you'll have to compute it yourself, so the cost to the attacker is higher. But if the account is valuable enough to justify targeting specifically, you'll do it.
The primary purpose of salting is to prevent precomputation being used to attack all users (e.g. rainbow tables). Even when specific salts are known they have already done this job.
Salts are not intended to be secrets.
If you want to treat a salt as if it was a private key, that would only provide additional protection for the very specific circumstance where the user hash is compromised, but the corresponding salt was not.
> then the rainbow table for that account can be computed entirely offline
So you basically bruteforce the password for a specific account before you get the actual hash but after you know the hashing scheme? I don’t see how this helps with any sort of attack though.
No. Such a rainbow table is per-username and non reusable. Which is the point of salting.
No, rainbow tables are hash-input specific. They're user-specific only if the salt is user-unique. Usernames aren't normally part of the hash input because they're assumed-public knowledge.
You can test this for yourself by creating a user account, then editing the master password database and manually changing the username without recalculating the password hash. The password will still work. If the username was part of the hash input, the password would fail.
You complained about them salting using the public username as salt. You asserted that this makes the rainbow table computable offline. I asserted that this didn't matter much for security since the table for H(username || secret) is username specific and not reusable for other usernames. Since precomputed rainbow tables consume quite a bit of space it is rather unlikely that anyone would have such a table stored for any random username.
You can still build a rainbow table for a specific username if you want to do a targeted attack.
I didn't complain about anyone. The Okta vulnerability isn't because of public salts.
That's fair.
Though you can salt a hash using a function that does not take a distinct salt input by just concatenating the salt with the value. This is a relatively common practice, but of course only works if there is no truncation of the salted input.
I mean yes overall, but why would put delimeters into hash? you just smash bytes together.
Username: x@example.xyz Password: .com/!@#$% Concatenated: x@example.xyz.com/!@#$%
Username: x@example.xyz.com Password: /!@#$% Concatenated: x@example.xyz.com/!@#$%
A delimiter fixes the problem if you're sure the delimiter character can never be inside the username and password. Better would be to prefix the length of each field. Better still would be separately hashing each field and concatenating the results.
Personally, I like a fixed uint8 or uint16 representing the length of each segment. Then, there are no forbidden characters or quoting required. Maybe I want to have \0 in my password.
I tend to use '\0' as a delimiter for this reason.
You still need to make sure nulls can't show up, and you need to consider possible truncation scenarios caused by those nulls and make sure they won't cause silent failures at any point.
> You still need to make sure nulls can't show up
Which is very easy to do without losing any desired functionality as opposed to delimiters in the ASCII character range.
> and you need to consider possible truncation scenarios
In particular hashing libraries worth using never have this problem.
> and make sure they won't cause silent failures at any point.
They literally only need to exist in the data to one function call. Afterwards they are not needed or significant.
Re the second quote and response:
One pattern I bump up against from time to time is the delta between using a perfectly defensible technique for a given use-case (safe delimiters when constructing an input for a specific function) versus a desire to have each decision be driven by some universal law (e.g. "if you're streaming data between services, using null bytes as delimiters might not be safe if consuming services may truncate at null bytes, so NEVER use null bytes as delimiters because they can be unsafe")
It's not even a matter of one "side" being right or wrong. You can simultaneously be right that this is perfectly safe in this use-case, while someone else can be right to be concerned ("need to consider possible") because the code will forever be one refactor or copy/paste away from this concatenated string being re-used somewhere else.
> In particular hashing libraries worth using never have this problem.
I'll note that the reason we're here in the first place is that they were using a password hash library with a completely unacceptable API.
By why does it matter? It's a one-way hash isn't?
Also, we're talking about user_id not user_email, so it should be the same length always. Well, unless you're silly and using databases sequence for IDs.
This is obviously a huge mistake by Okta (for the love of God understand how your crypto functions work before you apply them) but at the same time, a crypto function with a maximum input length that also auto-truncates the data sounds like bad API design. You are basically asking for someone to goof up and make a mistake. It's much better to implement these things defensively so that the caller doesn't inadvertently make a mistake. Especially with a hashing algorithm, because there is no way to verify that the result is correct.
Agree with the spirit of the argument, but I disagree about the bad design. BCrypt has its trade-offs, you are expected to know how to use it when using it, specially if by choice.
It's like complaining about how dangerous an axe is because it's super sharp. You don't complain, you just don't grab the blade section, you grab it by the handle. And
If passing more than 72 bytes to a function makes it silently fail, it IS bad design, especially for a sensitive, security-related function. The first condition in the function should be `if len(input) > 72 then explicitly fail`
Not letting people use your API incorrectly is API design 101.
To be clear this is not the fault of the bcrypt algorithm, all algorithms have their limitations. This is the fault of bcrypt libs everywhere, when they implement bcrypt, they should add this check, and maybe offer an unsafe_ alternative without the check.
There is no other answer than this. Silent failures are never acceptable, even if documented. Because despite what we want to believe about the world, people don’t read the docs, or read them and forget, or read them and misunderstand.
If your crypto library works like an Axe and the methods aren’t prefixed with “unsafe_”, the library is bad. I would expect an exception when a hashing function gets an argument that’s too long, not just dropping of the excess input. Who thinks that’s the best choice?
Passing something that isn’t a password + salt into bcrypt is the mistake here.
Even that sounds potentially dangerous to me now, since it effectively means that some extra-long "correct horse battery stapler"-style passwords could be left effectively unsalted. I mean yeah, 78 chars is an awfully long password but for some famous book or movie quotes maybe not outside the realm of possibility. Or if languages using double-byte characters effectively halve that cutoff then it really becomes an issue...
Concise write up; not surprising that cache played a part.
Can't tell if it's issue with BCrypt or with the state-data going into the key, or combo-cache lookup tho.
I think it's more of a logic problem. I suspect the engineers made a false assumption that bcrypt can hash a trivial amount of data like some other hashing algos.
They had literally 1 job: secure authentication. This isn't the first time Okta has had a gaffe on a level that should cause any customer to reconsider. What's that saying, "Fool me once, shame on thee, fool me twice, shame on me". Don't get fooled by Okta a second, third, or fourth time.
I'm really sick of companies disclosing this shit late Friday afternoon.
Go fuck yourselves.
Sincerely, Everyone in the industry
It's the second time they do that in a few weeks too, _and_ it's not on their security page [1] which promises transparency.
[1] https://trust.okta.com/
It is listed on their security advisories page, which you can navigate to from that link:
https://trust.okta.com/security-advisories/
Thank you for your feedback. Next time, we'll disclose it on Saturday evening.
-- With Love, Okta.
Also, are there any repercussions for this kind of stuff? I don't know, fines from the organizations they get compliance certifications from or something.
No repercussions, sadly.
Those compliance companies are (mostly) all just checking a box. It's (mostly) security theater from people who wouldn't know security if it bit them in the nether regions.
Even if that wasn't true, there's probably no box in any compliance regime that says "Yes, we loudly promulgate our security failures from the nearest rooftop on 10am on a weekday" (and it's always five o'clock somewhere, right?)
If it helps (I know it doesn't), the Executive Branch likes to do this with poor job number revisions, too, lol
geee, nobody is targeting you
Why is anyone actually using Okta for anything these days?
IMO, better to choose point solutions and combine them.
Wasn’t there a project posted here that can spot this things automatically.
It’s was a fuzzer of some sort
This is written in C, right?
Your rewrite of bcrypt in not C is unlikely to support longer passwords.
NotC, the language that entire operating systems haven't been written in.
Probably Java. This is not a memory vulnerability but a protocol vulnerability.
> This is written in C, right?
What's your point? That rewriting `bcrypt` in something else magically fixes this?
AIUI, the issue is that `bcrypt` only uses the first 72 bytes of the input to create a hash.
The issue is with the user mistaking bcrypt for a general-purpose digest hashing tool.
It's like using a flat-head screwdriver as a hardwood chisel and then the handle breaks off after the third strike.