Sometimes I wonder if package names, bucket names, github account names and so on should use a naming scheme like discord. Eg, @sometag-xxxx where xxxx is a random 4 digit code. Its sort of a middleground between UUID account names and completely human generated names.
This approach goes a long way toward democratizing the name space, since nobody can "own" the tag prefix. (10000 people can all share it). This can also be used to prevent squatting and reuse attacks - just burn the full account name if the corresponding user account is ever shut down. And it prevents early users from being able to snap up all the good names.
Notably Discord stopped using that format two years ago, moving to globally unique usernames.
Their stated reason[1] for doing so being:
> This lets you have the same username as someone else as long as you have different discriminators or different case letters. However, this also means you have to remember a set of 4-digit numbers and account for case sensitivity to connect with your friends.
I like it for buckets, but adding a four digit code won't help with the package hijacking side of things - in fact might just introduce more typo/hijack potential. It'll just be four more characters for people to typo.
> For Azure Blob Storage, storage accounts are scoped with an account name and container name, so this is far less of a concern.
The author probably misunderstood what "account name" is in Azure Storage's context, as it's pretty much the equivalent of S3's bucket name, and is definitely still a large concern.
A single pool of unique names for storage accounts across all customers has been a very large source of frustration, especially with the really short name limit of only 24 characters.
I hope Microsoft follows suit and introduces a unique namespace per customer as well.
I recall being shocked the first time I used Azure and realizing so many resources aren’t namespaced to account level. Bizarre to me this wasn’t a v1 concern.
Thank you author Ian Mckay! This is one of those good hygiene conventions that save time by not having to think/worry each time buckets are named. As pointed out in the article, AWS seems to have made this part of their official naming conventions [1].
I'm excited for IaC code libraries like Terraform to incorporate this as their default behavior soon! The default behavior of Terraform and co is already to add a random hash suffix to the end of the bucket name to prevent such errors. This becoming standard practice in itself has saved me days in not having to convince others to use such strategies prior to automation.
I started treating long random bucketnames as secrets years ago. Ever since I noticed hackers were discovering buckets online with secrets and healthcare info.
This is all good and we'll on the IaC side,yes. But at the end of the day, buckets are also user facing resources, and nobody likes random directory / bucket names.
It would be nice if the other end of this could be addressed: a configurable policy to limit resolution of bucket names within an account namespace. Ideally, if someone doesn’t have permission to resolve a bucket name, they shouldn’t even be able to detect whether it exists.
I just started using hashes for names. The deployment tooling knows the "real" name. The actual deployment hash registers a salt+hash of that name to produce a pseudo-random string name.
It is not hygienic, but with only the account-id you are fine. In the IAM rules the attacker can always just use a * on their end, so it does not make a difference. You have to be conscious to set proper rules for your (owner) end tho.
That would be a huge breaking change. Any workload that relies on re-using a bucket name would be broken, and at the scale of S3 that would have a non-trivial customer impact.
Not to mention the ergonomics would suck - suddenly your terraform destroy/apply loop breaks if there’s a bucket involved
Any workload that relies on re-using a bucket name is broken by design. If someone else can get it, then it's Undefined Behaviour. So it's in keeping with the contract for AWS to prevent re-use. Surely?
Potential reasons I can think of for why they don't disallow name reuse:
a) AWS will need to maintain a database of all historical bucket names to know what to disallow. This is hard per region and even harder globally. Its easier to know what is currently in use rather know what has been used historically.
b) Even if they maintained a database of all historically used bucket names, then the latency to query if something exists in it may be large enough to be annoying during bucket creation process. Knowing AWS, they'll charge you for every 1000 requests for "checking if bucket name exists" :p
c) AWS builds many of its own services on S3 (as indicated in the article) and I can imagine there may be many of their internal services that just rely on existing behaviour i.e. allowing for re-creating the same bucket name.
I can't accept a) or b). They already need to keep a database of all existing bucket names globally, and they already need to check this on bucket creation. Adding a flag on deleted doesn't seem like a big loss.
> If you wish to protect your existing buckets, you’ll need to create new buckets with the namespace pattern and migrate your data to those buckets.
My pet conspiracy theory: this article was written by bucket squatters who want to claim old bucket names after AI agents read this and blindly follow.
The entire article talks about “guessing” the bucket name as being the attack enabler, not the leaking of it. What does the landscape look like once you start doing the basics like hashing your bucket names? Is this still a problem worth engineering for?
Sometimes I wonder if package names, bucket names, github account names and so on should use a naming scheme like discord. Eg, @sometag-xxxx where xxxx is a random 4 digit code. Its sort of a middleground between UUID account names and completely human generated names.
This approach goes a long way toward democratizing the name space, since nobody can "own" the tag prefix. (10000 people can all share it). This can also be used to prevent squatting and reuse attacks - just burn the full account name if the corresponding user account is ever shut down. And it prevents early users from being able to snap up all the good names.
Notably Discord stopped using that format two years ago, moving to globally unique usernames.
Their stated reason[1] for doing so being:
> This lets you have the same username as someone else as long as you have different discriminators or different case letters. However, this also means you have to remember a set of 4-digit numbers and account for case sensitivity to connect with your friends.
[1]: https://support.discord.com/hc/en-us/articles/12620128861463...
I like it for buckets, but adding a four digit code won't help with the package hijacking side of things - in fact might just introduce more typo/hijack potential. It'll just be four more characters for people to typo.
I just want to be able to use a verified domain; @example.com everywhere.
That still has "squatting" risks as described in the original article though, domains expire and / or can be taken over.
> For Azure Blob Storage, storage accounts are scoped with an account name and container name, so this is far less of a concern.
The author probably misunderstood what "account name" is in Azure Storage's context, as it's pretty much the equivalent of S3's bucket name, and is definitely still a large concern.
A single pool of unique names for storage accounts across all customers has been a very large source of frustration, especially with the really short name limit of only 24 characters.
I hope Microsoft follows suit and introduces a unique namespace per customer as well.
Author here. Thanks for the call out! I've updated the article with attribution.
I recall being shocked the first time I used Azure and realizing so many resources aren’t namespaced to account level. Bizarre to me this wasn’t a v1 concern.
Thank you author Ian Mckay! This is one of those good hygiene conventions that save time by not having to think/worry each time buckets are named. As pointed out in the article, AWS seems to have made this part of their official naming conventions [1].
I'm excited for IaC code libraries like Terraform to incorporate this as their default behavior soon! The default behavior of Terraform and co is already to add a random hash suffix to the end of the bucket name to prevent such errors. This becoming standard practice in itself has saved me days in not having to convince others to use such strategies prior to automation.
[1] https://aws.amazon.com/blogs/aws/introducing-account-regiona...
That took a decade to resolve? Surprising, but hindsight is 20/20 I guess.
I started treating long random bucketnames as secrets years ago. Ever since I noticed hackers were discovering buckets online with secrets and healthcare info.
This is where IaC shines.
This is all good and we'll on the IaC side,yes. But at the end of the day, buckets are also user facing resources, and nobody likes random directory / bucket names.
It would be nice if the other end of this could be addressed: a configurable policy to limit resolution of bucket names within an account namespace. Ideally, if someone doesn’t have permission to resolve a bucket name, they shouldn’t even be able to detect whether it exists.
I just started using hashes for names. The deployment tooling knows the "real" name. The actual deployment hash registers a salt+hash of that name to produce a pseudo-random string name.
I take it advertising your account id isn't a security risk?
Armchair opinion, but shouldn't be too bad - it's identification, not authentication, just like your e-mail address is.
But probably best to not advertise it too much.
It is not hygienic, but with only the account-id you are fine. In the IAM rules the attacker can always just use a * on their end, so it does not make a difference. You have to be conscious to set proper rules for your (owner) end tho.
Why all that stuff with namespaces when they could just not allow name reuse?
That would be a huge breaking change. Any workload that relies on re-using a bucket name would be broken, and at the scale of S3 that would have a non-trivial customer impact.
Not to mention the ergonomics would suck - suddenly your terraform destroy/apply loop breaks if there’s a bucket involved
Any workload that relies on re-using a bucket name is broken by design. If someone else can get it, then it's Undefined Behaviour. So it's in keeping with the contract for AWS to prevent re-use. Surely?
Think terraform tests, temporary environments, etc. Or anything else: it’s Hyrum's Law.
Potential reasons I can think of for why they don't disallow name reuse:
a) AWS will need to maintain a database of all historical bucket names to know what to disallow. This is hard per region and even harder globally. Its easier to know what is currently in use rather know what has been used historically.
b) Even if they maintained a database of all historically used bucket names, then the latency to query if something exists in it may be large enough to be annoying during bucket creation process. Knowing AWS, they'll charge you for every 1000 requests for "checking if bucket name exists" :p
c) AWS builds many of its own services on S3 (as indicated in the article) and I can imagine there may be many of their internal services that just rely on existing behaviour i.e. allowing for re-creating the same bucket name.
I can't accept a) or b). They already need to keep a database of all existing bucket names globally, and they already need to check this on bucket creation. Adding a flag on deleted doesn't seem like a big loss.
As for c), I assume it's not just AWS relying on this behaviour. https://xkcd.com/1172/
I'd allow re-use, but only by the original account. Not being able to re-create a bucket after deleting it would be annoying.
I think that's an important defense that AWS should implement for existing buckets, to complement account scoped bucket.
> If you wish to protect your existing buckets, you’ll need to create new buckets with the namespace pattern and migrate your data to those buckets.
My pet conspiracy theory: this article was written by bucket squatters who want to claim old bucket names after AI agents read this and blindly follow.
Huh? Hash your bucket names
if your bucket name is ever exposed and you later delete it, then this doesn't help you.
The entire article talks about “guessing” the bucket name as being the attack enabler, not the leaking of it. What does the landscape look like once you start doing the basics like hashing your bucket names? Is this still a problem worth engineering for?
I don't think that'd prevent this attack vector.
Ok; salt, and then hash your bucket names