I used the same approach based on Rclone for a long time. I wondered what makes Regatta Storage different than Rclone. Here is the answer: "When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients." [0].
Rclone, on the contrary, has no layer that would guarantee consistency among parallel clients.
This is exactly right, and something that we think is particularly important for applications that care about data consistency. Often times, we see that customers want to be able to quickly hand off tasks from one instance to another which can be incredibly complex if you don't have guarantees that your new operations will be seen by the second instance!
This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!
1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?
2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?
3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?
4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?
5. I have to ask - how do you think about open source here?
6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)
I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.
Wow, thanks for the nice note! No questions are silly, and I'll also note that we now have a docs site (https://docs.regattastorage.com) and feel free to email me (hleath [at] regattastorage.com) if I don't fully address your questions.
> If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?
We don't actually do caching on your instance's disk. Instead, data is cached in the Linux page cache (in memory) like a regular hard drive, and Regatta provides a durable, shared cache that automatically expands with the working set size of your application. For example, if you were trying to work with data in the 50 GiB range, Regatta would automatically cache all 50 GiB -- allowing you to access it with sub-millisecond latency.
> Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?
For now, yes -- the speed is highly dependent on latency -- which is highly dependent on distance between your instance and Regatta. Today, we are only in AWS, but we are looking to launch in other clouds by the end of the year. Shoot me an email if there's somewhere specifically that you're interested in.
> I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?
There are a couple of different questions bundled together in this. Today, Regatta exposes an NFSv3 file system that you can mount. We are working on a new protocol which will be mounted via FUSE. However, in Docker environments, we also provide a CSI driver (for use with K8s) and a Docker volume plugin (for use with just Docker) that handles the mounting for you. We haven't released these publicly yet, so shoot me an email if you want early access.
> Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?
Yes, you should be able to run a database on Regatta.
> I have to ask - how do you think about open source here?
We are in the process of open sourcing all of the client code (CSI driver, mount helper, FUSE), but we don't have plans currently to open source the server code. We see the value of Regatta in managing the infrastructure so you don't have to, and if we release it via open-source, it would be difficult to run on your own.
> Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)
Yes, you can mount on multiple servers simultaneously! We haven't specifically stress-tested the number of clients we support, but we should be good for O(100s) of mounts. Unfortunately, AWS locks down Lambda so we can't mount arbitrary file systems in that environment specifically.
> efs performance
Yes, the challenge here is specifically around the semantics of NFS itself and the latency of the EFS service. We think we have a path to solving both of these in the next month or two.
Thank you for the detailed answers! Honestly, this project inspires me to work on infrastructure problems.
So you are saying that regatta's own SaaS infrastructure provides the disk caching layer. So you all make sure the pipe between my AWS instance and your servers are very fast and "infinitely scalable", and then the sync to S3 happens after the fact.
Do I understand correctly that the data gets decrypted at your Regatta AWS instances, before the data ends up in the customer's S3 bucket? It sounds like the SSL pipe used for NFS is terminated at Regatta servers. Can customers run the Regatta service on their own hardware?
Or does Regatta only have access to filesystem metadata -- enough to do POSIX stuffs like locks, mv, rm -- but the file contents themselves remain encrypted end-to-end?
This is correct, we encrypt data in-transit to the Regatta servers (using TLS), and we encrypt any data that the Regatta servers are storing. Of course, when Regatta communicates with S3, that's also encrypted with TLS (just like using the AWS SDK). However, we don't pass the encrypted data to S3, otherwise you wouldn't be able to read it from the bucket directly and use it in other applications!
Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?
Yes, you should be in our target market. I don't think that I can give a cost estimate without having a good sense of what percentage of your data you're actively using at any given time, but we should absolutely support the performance numbers that you're talking about. I'd love to chat more in detail, feel free to send me a note at hleath [at] regattastorage.com.
You pay a datacenter to put it in a rack and add connect power and uplinks, then treat it like a big ec2 instance (minus the built-in firewall). Now you just need someone who knows how to secure an ec2 instance and run your preferred software there (with failover and stuff).
If you run a single-digit number of servers and replace them every 5 years you will probably never get a hardware failure. If you're unlucky and it still happens get someone to diagnose what's wrong, ship replacement parts to the data center and pay their tech to install them in your server.
Bare metal at scale is difficult. A small number of bare metal servers is easy. If your needs are average enough you can even just rent them so you don't have capital costs and aren't responsible for fixing hardware issues.
If this product is successful, what prevents AWS from cloning it at a lower price (perhaps by leveraging access to their infrastructure) and putting you out of business?
I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.
Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.
Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?
Hey -- I think this is something that's in-scope for our custom protocol that we're working on. I'd love to chat more about your needs to make sure that we build something that will work great for you. Would you mind shooting an email to hleath [at] regattastorage.com and we can chat more?
Wow, looks like a great product! That's a great idea to use NFS as the protocol. I honestly hadn't thought of that.
Perfect.
For IBM, I wrote a crypto filesystem that works similarly in concept, except it was a kernel filesystem. We crypto split the blocks up into 4 parts, stored into cache. A background daemon listened to events and sync'ed blocks to S3 orchestrated with a shared journal.
It's pure magic when you mount a filesystem on clean machine and all your data is "just there."
> It's pure magic when you mount a filesystem on clean machine and all your data is "just there."
I totally agree! I am hoping that Regatta can power a future where teams don't need more than ~8 GiB of local storage for their operating system, and can store the rest on something like Regatta to get rid of the waste of overprovisioned block volumes.
Let's hope so, I'd love to help teams take storage infrastructure management off of their plate! If you're in the public sector and interested in trying out Regatta, please shoot me an email at hleath [at] regattastorage.com.
In (March?) 2007 (correction 2008) myself and two other engineers in front of Bruce Chizen - Adobe's CEO in a small conference room in Bucharest demoed a photo taken with an iPhone automagically showing as a file on a Mac. I implemented the local FUSE talking to Ozzy - Adobe's distributed object store back then, using an equivalent of a Linux inode structure. It worked like a charm and if I remember correctly it took us a few days to build it. It was a success just as much as Adobe's later choices around http://Photoshop.com were a huge failure. A few months later Dropbox launched.
That kickstarted about a decade in (actual) research and development led by my team which positioned the Bucharest center as one of the most prolific centers in distributed systems within Adobe and of Adobe within Romania.
But I didn't come up with the concept, it was Richard Jones that inspired us with the Gmail drive that used FUSE with gmail attachments back in 2004 when I got my first while still in college https://en.wikipedia.org/wiki/GMail_Drive. I guess I'm old, but I find it funny to see Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS
The funny thing about storage is that all of the problems are the same! Ultimately, there is no problem that cannot be solved with caching, journaling, write-ahead logging, etc. I think what makes the problem space so interesting is how a million different products can make a million different trade offs with these tools to deliver on their customer needs. File systems are awesome.
> The funny thing about storage is that all of the problems are the same!
they are all the same and they are all more than what would at the surface seem that it's "just files" the whole OS, especially Linux/UNIX is "just files" and if you look deeper at databases you can see how it boils down to the file formats (something that was visible with LevelDB but maybe less so with RocksDB, I guess)
Yes! This is my expectation. Lots of the big companies have already done this with in-house architecture. With Regatta, we want to democratize building stateless applications that can take advantage of the low-cost storage of S3.
One of my hopes for Regatta is that we're able to power the next generation of these data platforms. These things work because the designers had specialized storage knowledge that allowed them to carefully build serverless data products. I hope that Regatta is generic enough to allow anyone to build a serverless data product moving forward, without having to think about their storage infrastructure.
But it's not clear how it handles file update conflicts.
For example: if User A updates File X on one computer, and User B updates File X on another computer, what does the final file look like in S3?
Hey there, our file system is strongly consistent for all connected file system clients. For example, if User A and User B are both connected via Regatta, then this works like any other NFS file system (in that they can use file locks, atomic renames or other techniques to ensure that one write wins). However, if User A and User B are accessing the data through different protocols (for example User A is using Regatta and User B is accessing the data through S3), then it's possible to get undefined behavior by attempting to simultaneously update the same piece of data from both places. We think that these applications are rare, and (almost by definition) likely don't exist right now. For the most part, customers use file storage as a "stage" in a broader workflow (for example, customers may ingest data through S3 and then process it on a file system), and that is totally consistent.
This is accurate! A lot of people have spent a lot of time trying to build a good file system abstraction on cheap, S3 storage. However, Regatta differs from these solutions in two important ways. First, Regatta is a shared, durable caching layer that sits between your instances and S3. This means that Regatta is able to efficiently perform operations (like directory renames) and provide strong consistency to other file system clients (whereas s3fs and other FUSE file systems would need to actually perform those operations in S3 for other clients to see the output). Secondly, Regatta is designed to support all file system operations. This means that you can do file locking, random writes, appends, and renames -- even when they aren't efficient to perform on S3.
Super interesting product. I have a couple of questions:
In terms of storing in s3 - is that in your buckets? Sound like the plan is to run the caching on your infrastructure, are there plans to allow customers to run those instances themselves?
Presumably the format within s3 is your own bespoke format? What does the migration strategy look like for people looking to move into or out of your infrastructure? They effectively pull everything down from their s3 to the local “filesystem”?
I love this because it allows me to highlight the parts of the system that I'm most excited about. The Regatta caching runs on our infrastructure, but it connects to buckets that our customers control. We read and write data into the customer's bucket in a regular, native (not bespoke) format -- so you can connect a Regatta file system directly to a bucket that already exists, with data in it, and use that data from a file system without any data migration!
Sure can, full disclosure, copied from a comment below:
Thanks for the question! Mountpoint for Amazon S3 is a FUSE layer that doesn't support full POSIX semantics. For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames. This means that you have to carefully instrument your application to understand whether or not it's compatible with Mountpoint, which can be error-prone. Regatta, on the other hand, provides full POSIX compatibility for the file interface, which means that it works out-of-the-box with all file based applications.
Yeah, I like to think of it in a similar vein. We want to empower people to create stateless workflows where they may have previously needed to think about state management. Today, Regatta is an NFS file system where the cache lives on our shared infrastructure. However, when we complete the work on our custom protocol, that will be a FUSE file system which offers additional caching on your instances to enable truly local-like performance.
Neat stuff. I think everybody with an interest in NFS has toyed with this idea at some point.
> Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.
How do you handle the cache server crashing before syncing to S3? Do the cache servers have local disk as well?
Ditto for how to handle intermittent S3 availability issues?
What are the fsync guarantees for file append operations and directories?
> How do you handle the cache server crashing before syncing to S3? Do the cache servers have local disk as well?
Our caching layer is highly durable, which is (in my opinion) the key for doing this kind of staging. This means that once a write is complete to Regatta, we guarantee that it will eventually complete on S3.
For this reason, server crashes and intermittent S3 availability issues are not a problem because we have the writes stored safely.
> What are the fsync guarantees for file append operations and directories?
We have strong, read-after-write consistency for all connected file system clients -- including for operations which aren't possible to perform on S3 efficiently (such as renames, appends, etc). We asynchronously push those writes to S3, so there may be a few minutes before you can access them directly from the bucket. But, during this time, the file system interface will always reflect the up-to-date view.
So, I assume you use a journal in the cache server.
A few related questions:
* Do you use a single leader for a specific file system, or do you have a cluster solution with consensus to enable scaling/redundancy?
* How do you guarantee read-after-write consistency? Do you stream the journal to all clients and wait for them to ack before the write finishes? Or at least wait for everyone to ack the latest revisions for files, while the content is streamed out separately/requested on demand?
* If the above is true, I assume this is strictly viable for single-DC usage due to latency? Do you support different mount options for different consistency guarantees?
These are questions that are super specific to our implementation, that I'm hesitant to share publicly because they could change any at any time. I can share that we're designed to horizontally scale the performance of each file system, and our custom protocol will enable Lustre-like scale out performance. As for single- vs. multi-DC, I think that you'd be surprised at how much latency budget there is (a cross-DC round trip in AWS can be anywhere from 200us-700us, and EBS gp3 latencies are around 1000us).
Is it fair to say this is best suited for small files that will be written infrequently?
There’s no partial write for s3 so editing a small range of a 1 GiB file would repeatedly upload the full file to the backing s3 right?
Or is the s3 representation not the same hierarchy as the presented mount point? (ie something opaque like a log structured / append only chunked list)
It's hard to define "best", and in many cases, the answers to these questions depend heavily on the workload and the caching parameters (how long do we wait before flushing to S3, etc). We are designed to provide good file system performance, even if customers are repeatedly writing small pieces of data to a 1 GiB file, so "best" in this case is a question of whether or not it's cost efficient.
Without getting too much into the details of the system, our durable cache is designed for 5 9s of durability (and we're working on a version that will provide 11 9s of durability soon). You can't achieve those durability numbers on a single attached NVMe device without some kind of replication.
definitely the thing I want to hear more about. Also, I can't help shake the "what's the catch, how is no one else doing this, or are they doing it quietly?" feeling.
Trust me, I feel the same way. The problem with these things is that you end up building a company because you get so much conviction that what you're doing is the right thing for customers, and you end up shocked that this isn't the default for everyone.
I am not your target audience but I have been thinking of building a very minified version of this using [0] Pooch and [1] S3FS.
Right now we spend a lot of time downloading various stuff from HTTP or S3 links and then figuring out folder structures to keep them in our S3 buckets. Pooch really simplifies the caching for this by having a deterministic path on your local storage for downloaded files, but has no S3 backend.
So a combination of 2 would be to just have 1 call to a link that would embed the caching both locally and on our S3 buckets deterministically.
I think this is a great insight, and something that I think about often. The challenge that I see is that the scientist archetype (whether it's data science, AI researcher, or anything else) isn't really interested in doing software development for these kinds of things. They just want the data to be there, and it's super nice to be able to click through the S3 console to be able to see and share the data their using. I think that what you're doing is a great idea for folks who are accessing their data primarily through Python programs!
Love this idea! Biggest hurdle though have been to have predictable Auth&IO across multiple Python/Scala versions and all other things (Spark, orchestrators, CLI's of teams of varying types of OS etc etc) add to that access logs.
SF3s/boto/botocore versions x Scala/Spark x parquet x iceberg x k8s etc readers own assumptions makes reading from S3 alone a maintenance and compatibility nightmare.
Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?
If so then you will win 100% I wouldn't even care about speed/cost if it's up to par with s3
Yeah, that's exactly right. I had some... experiences with Spark recently, that convinced me that this is something that could really help. I also really like the idea that organizations can continue to use S3 as the source of truth for their data (as you mention, it means that you can continue to use Access Logs, which would capture all usage of your S3 bucket across your applications).
> Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?
Ha, well it depends on what you mean by surprises. We won't have a Python-specific file system. Our client is going to come in two flavors. Today, you can mount Regatta over NFSv3 (which we wrap in TLS to make it secure). This works for some workloads, but doesn't provide like-for-like performance with EBS. Over the next month, we plan to release the "custom protocol" that I wrote about above, that we expect to send to customers in the form of a FUSE file system.
Either way, it should be one package, you shouldn't need to worry about versioning, and it will appear as a real, local file system. :D
You are correct in that NFS is not strictly-speaking POSIX compliant to the letter of the law, due to the caching behavior. This is an NFSv3 file system, so it shares those semantics. The point that I'm trying to emphasize is that the file system supports standard file operations which aren't possible through other FUSE adapters, or possible to perform efficiently on S3 (such as append, rename, and symbolic links) -- which provides broad compatibility with file-based applications.
Which is nice and useful of course but there is ton of things that can't reliably be done with that (like running any database you that comes to mind) which makes it important to be precise here.
Is there something specific that you worry about when running a database on a networked file system? I would imagine that any database which is correctly fsync'ing the data to the write-ahead-log should work just fine.
It's similar to JuiceFS, but JuiceFS writes and reads data from S3 in a proprietary block format. This means that you cannot connect JuiceFS to existing data sets in S3, and you cannot use data written through JuiceFS from the S3 API directly. On the other hand, Regatta reads and writes data to S3 using it's native format -- so you can do these things!
I don't see any other question about it, so maybe I just missed the obvious answer, but how do you handle POSIX ACLs? If the data is stored as an object in S3, but exposed via filesystem, where are you keeping (if at all?) the filesystem ACLs and metadata?
Great call out. Some kinds of data, like ACLs and specific kinds of metadata, don't live in S3. Full disclosure, we don't support ACLs today (but plan to soon). We keep file system metadata in the durable cache. For some files (where users haven't changed permissions, etc), we are able to release that cached metadata when the file is no longer in use. For other files (where permissions have been changed by the user), that metadata must live in the cache long-term.
We selected NFSv3 due to it's broad compatibility with different compute environments. For example, Windows has an NFSv3 client in it, but doesn't have an NFSv4 client. There are lots of enterprise workloads which needs simultaneous access to file data from both Windows and Linux, and supporting NFSv3 was the easiest path to support those workloads.
Do you pay for metadata accesses? Does running a `find` across the filesystem cost anything? What about system calls that don't transfer data? Can I move or rename a file without paying to copy and then delete the associated S3 object?
Today, we only charge for cache usage (storage) and data transfer between Regatta and S3. If your metadata access doesn't require transfer to S3, then it doesn't cost anything! However, renames do require transfer to S3 (because we have to move the object on the backend).
I'd love to hear more about what you're excited to do when the magic arrives. :D
We are running it as a managed SaaS, so our customers connect to the caching layer that runs in the Regatta VPC. This allows us to manage the infrastructure for them and keep costs low.
Storage Gateway is an interesting product, and I worked closely with that team for several years -- so mad respect for them. It was designed to be an appliance that you run on servers in your own data center (of course, many customers now deploy it to EC2). Because of this, it's designed to operate in an environment with "finite storage" -- for example, different workload pattterns can thrash the cache, which results in poor performance to clients, and it's not designed to run in a high-availability cluster in the cloud. Regatta solves these problems with durable cache storage that's safe to data in long-term, and is designed for high-availability.
Super interesting project. But I cannot understand why you support only EC2 instances as clients. For what it is worth, it looks strange and limiting. By default I expect to be able to use Regatta Storage from everywhere: from my local machine, from my Docker containers running elsewhere, etc.
This isn't a technical limitation, per se, but a time limitation in terms of getting to the place where we feel comfortable supporting those environments for the public. I still wouldn't recommend mounting it from a local environment (because NFS behaves pretty poorly when it can't connect to the server), but we do have a CSI driver for containers running in K8s. We expect that customers will get the best experience if their instances are very close (latency-wise) to our instances, which is why we only support access from us-east-1 in AWS. We expect to launch in more regions and clouds in the coming months.
If you want early access to other clouds or the CSI driver, feel free to email hleath [at] regattastorage.com.
It depends on what you're doing with EFS! For the most part, I would expect to be lower cost than EFS. If you're doing where individual files are primarily written or accessed from an individual instance, I would expect a significant improvement in performance. If you have some time, I'd love to chat more deeply about what you're doing. Feel free to grab some time on my calendar from the Demo link on the Regatta home page or shoot me an email at hleath [at] regattastorage.com.
That's exactly right, I've spoken with a ton of folks who have had a good experience with Lucid Link. I think that we are in a slightly different part of the market (in that we aren't targeting video editors, and more of data-intensive applications which may use thousands of IOPS), but I appreciate that the technology is likely similar.
How does this compare to Amazon's own offering in this space, the "AWS Storage Gateway"? It can also back various storage protocols with S3, using SSDs for cache, etc. (https://aws.amazon.com/storagegateway/features/)
Great question! We fill the same role as AWS Storage Gateway (and I used to work closely with that team when I was at AWS, lots of respect for what they do). AWS Storage Gateway is built primarily as an appliance to be installed on instances in your own data center to ease migration to the cloud. Many customers do deploy Storage Gateway on EC2 because they want these features in the cloud itself. However, the "appliance" design of Storage Gateway makes it unsuitable for this purpose. For example, Storage Gateway is not designed to run in a cluster for high-availability and doesn't have access to durable, long-term storage to stage and cache writes.
On the other hand, Regatta is designed as a cloud-native gateway product. Regatta's elastic, durable caching layer allows us to efficiently cache large data sets without thrashing, and always efficiently perform writes. Because Regatta is designed to be highly-available, customers don't have to worry about downtime for patching or deployments.
Also true! If you look at their site, they're really targeting folks to deploy it into their data centers to provide on-premises caching of resources in AWS, rather than providing a high-speed cache within AWS for file-based applications.
Great question! Full disclosure, answer copied from a another comment:
It's similar to JuiceFS, but JuiceFS writes and reads data from S3 in a proprietary block format. This means that you cannot connect JuiceFS to existing data sets in S3, and you cannot use data written through JuiceFS from the S3 API directly. On the other hand, Regatta reads and writes data to S3 using it's native format -- so you can do these things!
:sunglasses: We think it's important to be where our customers are, and we're looking to prioritize the next regions that we launch in based on customer demand. We expect to be in more regions by the end of the year.
Time! We don't have a lot of people right now, so every minute that we spend launching infrastructure (especially in non-AWS clouds) is a minute that we can't spend on performance improvements for our customers.
1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
... I'm kidding, this is quite useful.
I really wish that NFSv3 and Linux had built-in file hashing ioctls that could delegate some of this expensive work to the backend as it would make it much easier to use something like this as a backup accelerator.
Ha, thank you for the FTP comment, I was hoping someone would make it.
> I really wish that NFSv3 and Linux had built-in file hashing ioctls that could delegate some of this expensive work to the backend as it would make it much easier to use something like this as a backup accelerator.
Tell me a bit more about what you mean here. We're interested in really pushing the limits of what a storage system can do, so I'd be potentially interested.
I rejected EFS as a common caching and shared files layer, despite being technologically an excellent fit for my stack, because it is astronomically expensive. The value created didn’t match the cost.
When I got in touch about that, I was confronted with a wall of TCO papers, which tells me the product managers evidently believe their target segment to be Gartner-following corporate drones. This was a further deterrent.
We threw that idea away and used memcached instead, with common static files in a package in S3.
I guess I’m suggesting, don’t be like EFS when it comes to pricing or reaching customers.
It's certainly my hope to be cost effective, but I understand the worry and I'm sorry that you had that experience with the PMs of that time. At the end of the day, I see my target customers as those who aren't interested in running their own infrastructure and having to manage availability and durability (in memcached case, things like needing to pre-warm the cache). I understand that it still may be possible to be more cost effective if you're willing to trade off ease of use for dealing with those other concerns.
oh interesting, I'd love to mount this to Finder on Mac, and load a bunch of massive bioinformatics databases on there and treat it like another folder
I'm also using Cloudflare R2 (S3 compatible) and would love for that to work out of the box
I know a lot of folks have asked me for local support, and while I can share that this would work from OS X -- it's not something that I would recommend doing outside of a data center because the semantics of a networked file system on a sporadic internet connection (when compared to a data center) aren't great -- unless you're doing something higher level like Dropbox. However, it's something we're considering for next year.
This reminds me on using rclone mount on Terrabytes of data and I mostly wanted some "smaller" files between 200kb-1.5MB in a single directory. I made rclone mount significantly faster when rclone mount caches into a Ramdisk (there is a free tool to make Ramdisks on macOS too).
There isn't any reason that it shouldn't be a supported use case, depending on your exact performance needs and workflow. It's very similar to ObjectiveFS except that it operates on the data in your S3 bucket in it's native format, so you can point it at existing data sets, and use the newly written data directly from S3.
We need to update the home page with these details, but $0.05 is only charged on transfer between Regatta and S3. We calculate your cache usage minutely and tally it into a monthly usage amount that we then bill for.
> You don't actually directly charge for storage itself, so I assume this a "bring your own s3 bucket" type of deal, correct?
That's correct -- we store data in the customer S3 bucket.
> How long does data, that is no longer being accessed sit in the cache and count towards billing?
We keep data in the cache for up to 1 hour after you've stopped accessing it.
> As for availability, are you in the process or do you have plans to also support Google Cloud?
We have plans to support Google Cloud. If you're interested in using us from GCP, I'd recommend setting up some time to chat (either use the website or email me at hleath [at] regattastorage.com). We are prioritizing where we launch our infrastructure next based on customer demand.
Huh, that's interesting. I wouldn't imagine that there were security problems specific to FUSE compared to any other software that you would run on your servers. Regardless, I see FUSE as the fastest path to getting our protcol in the hands of our customers. In the fullness of time, I hope that we can deliver it as either a kernel-module or in-tree.
Just want to say this is super cool. I'm excited to see what people build on top of it.. seems like it could enable a new category of hosted data platforms-as-a-service (platform-as-a-services?).
This is more or less exactly what I'm hoping for. I think that people are excited to build stateless applications, but often that requires really specialized application and storage knowledge to pull off. My hope is that people can use this generic storage layer to build the next generation of stateless applications (including things like databases) without having to become storage experts themselves. I'm also excited to see what they build.
One of the fun parts about working on storage and file systems in particular, is that these techniques are old as time. Log-structured writes, journals, caching, etc -- are all non-novel. However, the benefit to our customers is in how easy we make it for them to use something like this without having to deploy or build it themselves.
Interesting. Reminds me of FlexFS (https://flexfs.io/). I spoke to a very knowledgeable person there when investigating what to use but we ended up using EFS instead.
An annoying feature of EFS is how it scales with amount of storage, so when its empty its very slow. We also started hitting its limits so could not scale our compute workers. Both can be solved by paying for the elastic iops but that is VERY expensive.
FlexFS kicks ass. I benchmarked it for our data storage and processing layers in value.space (satellite data processing and analysis) and we will most likely migrate to FlexFS in the near future.
Out of curiosity, why did you choose EFS, it's insanely expensive at even modest scales?
Yes, I think it's similar product, but we're looking to provide high performance on all dimensions (latency, throughput, and IOPS). I totally agree with you that Elastic Throughput solves this problem, but it can be expensive for many workloads!
Just the theme that we ended up using for the marketing site. We will likely build something less janky post-batch, but right now -- just trying to get the information out there.
This feels, intuitively, like it would be very hard to make crash consistent (given the durable caching layer in between the client and S3). How are you approaching that?
It depends on what you mean by crash-consistent. I would expect that we handle crash-consistency at the client fine (since it is the same crash-consistency of NFSv3) and craash-consistency at the server also fine (since we are able to detect using etags what version of an object is in the backing data storage). Tell me a bit more about what you're thinking.
For sure! Upon reflection, maybe I’m less curious about crash consistency (corruption or whatever) per-se, and more about what kinds of durability guarantees I can expect in the presence of a crash.
I’m specifically interested in how you’re handling synchronization between the NFS layer and S3 wrt fsync. The description says that data is “asynchronously” written back out to S3. That implies to me that it’s possible for something like this to happen:
1. I write to a file and fsync it
2. Your NFS layer makes the file durable and returns
3. Your NFS layer crashes (oh no, the intern merged some bad terraform!) before it writes back to S3
4. I go to read the file from S3… and it’s not there!
Is that possible? IE is the only way to get a consistent view of the data by reading “through” the nfs layer, even if I fsync?
So, the step that differs from your concern is Step 3. Let's say that we have a catastrophic availability scenario (as you said, intern comes in and tears down something) -- our job is to make sure that the data in our durable cache remains there (and to put safeguards in place to prevent the intern from hitting that data). If we do that, then any crash of our system will get the data back and be able to apply it to S3. I know that's kind of hand-wavy, but this is how things like AWS S3 work -- just having a super high bar for processes around operations to keep data safe.
For some reason, I don't see a "reply" button to your later comment (maybe there's an HN threading limit), but the answer is yes -- fsync guarantees durability in the Regatta durable cache, not in S3.
Gotcha! Thanks for the answer; so the tl;dr is, if I’m understanding:
“All fsync-ed writes will eventually make it to S3, but fsync successfully returning only guarantees that writes are durable in our NFS caching layer, not in the S3 layer”?
Yes, you can absolutely get similar functionality with rclone. However, what we are solving for our customers is the ability to do this without thinking about infrastructure or deployments. Customers don't need to worry about data durability, replication, recovering off of failed drives, or availability through deployments or patches.
It can offer an advantage over the built-in caching, but it depends on your exact access patterns. For example, if you are running ClickHouse on multiple servers and accessing the same reference data, it's more efficient to cache that data in a centralized location (like Regatta) instead of on the disk of each individual instance.
Philosophically, our goal is to build a standard that can be used in these kinds of applications moving forward, so that application developers don't need to build streaming over and over again and users don't need to learn how to configure each individual systems' caching.
Thanks for the question! Mountpoint for Amazon S3 is a FUSE layer that doesn't support full POSIX semantics. For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames. This means that you have to carefully instrument your application to understand whether or not it's compatible with Mountpoint, which can be error-prone. Regatta, on the other hand, provides full POSIX compatibility for the file interface, which means that it works out-of-the-box with all file based applications.
> For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames.
Can you support these operations with the expected semantics and performance?
If the application makes a one-byte change to a giant file and calls fdatasync, what happens? Do you re-upload the entire file to S3?
How do you handle a rename? Applications commonly do this for atomic replacement on POSIX and expect three properties from this operation:
* fast.
* destination always points to either the original or new afterward (on success or failure); no scenario at which it's lost/truncated.
* no extra storage used (on success or failure).
Do you guarantee any of those? How? I don't see an obvious way from the S3 HTTP API.
Given that POSIX API doesn't support things like arbitrary per-operation deadlines/timeouts, do you think it's suitable as a distributed filesystem API at all? Why?
The tl;dr of this is -- yes. We have a durable caching layer that we use to stage writes before we asynchronously replicate them to S3. This means that we are able to quickly (<1ms) perform operations like single-byte updates and renames and provide strong read-after-write consistency to other file system clients.
Once the operation is stored in our durable cache, then we update your S3 bucket to match what the file system expects. This generally takes around a minute, but could take longer depending on the number of S3 operations a file operation translates to (for example, a directory rename requires that CopyObject each object in the directory in S3).
I think that the POSIX API is to here to stay (like the S3 API). I agree that it would be better to have timeouts and deadlines, but I don't think that those make it impossible to provide a good distributed file system experience on POSIX (look at Amazon's EFS, Oracle's FSS, Google's FileStore, etc). It just makes the bar for availability higher.
Does Regatta require a local disk sized for the entire file to support random writes? One problem I’ve seen is that we have set up instances with a modest local disk but then work with files for which we need to pull the whole file into a local cache modify some parts and then push the full result back into s3. It would be helpful to have a way to work with s3 as though it were posix without having to match the local disk size to the largest file we might need to process.
This is exactly the problem that we solve! You don't need any local disk on your EC2 instance in order to use Regatta or work with data in S3. Our high-speed caching layer plays the role as this local disk for you, so that you can work with data sets that are hundreds of TiBs, even if you only have a 20 GiB EBS volume on your instance.
Well, in my opinion, I want to deliver the lowest latency possible. I expect that we will have Regatta running in GCP and Azure within the next 6 months. I'd love to connect if there's a place on-prem that you're looking to use Regatta. Would you shoot an email to hleath [at] regattastorage.com, and we could chat about what you're looking for?
Those costs only apply to data transfer into and out of AWS. If you're running EC2 instances in AWS, your Regatta file system is in AWS, and your S3 bucket is in AWS -- then you shouldn't incur additional data transfer fees.
Yes, that's correct re: Region -- thanks for the clarification.
In some sense, yes. But, the consistency that you're trading is only for accessing data simultaneously through the file interface and the S3 interface simultaneously. The consistent is CP/strong when you access the data through the file interface. The model that we see most often work is folks will ingest data through S3 (for example, an 'input/' prefix), and then the file system will process that data and place it in a different directory (for example, an 'output/' folder). Then, if it takes a minute or two for those to update on the other side, it's not a big deal.
Hey there, I have mutual friends with some of the Nasuni folks, and I have a lot of respect for what they do. In particular, Nasuni stores data in a proprietary block format in your S3 bucket, so you can't connect it to existing data sets or use that data directly from S3 out the other side. Whereas with Regatta, we store data in its native format in S3 so you can do these things.
What's cool about the storage market is that there are so many impressive companies because there are so many varied needs from customer applications! We're hoping to become a simple "default" for teams who are writing applications in the cloud.
That's correct -- every file is an S3 object. If you change the middle of a large file, Regatta will store the change on our durable caching layer efficiently (and most writes complete in under 1ms). Regatta will then asynchronously update the large object in S3, which may take longer. We automatically batch multiple changes together to minimize the number of operations to your S3 bucket!
All connected file system clients see strong, read-after-write consistency. Most file operations are synchronized to S3 within a few minutes of completion.
Write conflicts between the file system and S3 should be rare (by definition, applications shouldn't yet be designed to do this because Regatta doesn't exist). We do some tracking of the object etag to at least throw an alert if we find that something unexpected has happened, and we're looking at the best UX to expose that to customers soon.
I could totally be misreading DirectPV, but it appears to be a way to use K8s Persistent Volumes to manage things like NVME drives which are attached to each node, and doesn't provide any tie in to S3 (outside of the fact that it's built to power MinIO).
Great question! We fill the same role as AWS Storage Gateway (and I used to work closely with that team when I was at AWS, lots of respect for what they do). AWS Storage Gateway is built primarily as an appliance to be installed on instances in your own data center to ease migration to the cloud. Many customers do deploy Storage Gateway on EC2 because they want these features in the cloud itself. However, the "appliance" design of Storage Gateway makes it unsuitable for this purpose. For example, Storage Gateway is not designed to run in a cluster for high-availability and doesn't have access to durable, long-term storage to stage and cache writes.
On the other hand, Regatta is designed as a cloud-native gateway product. Regatta's elastic, durable caching layer allows us to efficiently cache large data sets without thrashing, and always efficiently perform writes. Because Regatta is designed to be highly-available, customers don't have to worry about downtime for patching or deployments.
Hey, thanks for asking. It very much depends on which aspect of Regatta you're interested in using. I know of a couple of different architectures -- some folks wrote in "rclone" in the thread, I know of people using SeaweedFS if you want to host storage infrastructure yourself, etc.
I'd love to know a bit more about why you're looking for an open source alternative. Is it because of costs (i.e. you'd like an open source alternative that doesn't require you to pay) or if it's because of the operating environment (i.e. you want an open source alternative so that you can deploy it to your own infrastructure)? There are some things that we are exploring around deploying onto your own infrastructure over the next 12 months, but I'd love to learn more. Feel free to respond here or email me at hleath [at] regattastorage.com.
In 2024, you are better off dropping the file system abstraction entirely and just embracing object storage abstractions (and ideally, immutable write-once objects).
Source: personal experience, I've done the EFS path and the S3-like path within the same system, and the latter was much easier to develop for and troubleshoot performance. It's also far cheaper to operate.
You can have local caching, rapid "read what I wrote", etc. with very little engineering cost, no one at my company is dedicated to this because the abstraction is ridiculously simple:
1. It's object storage, not a file system. Embrace immutability.
2. When you write to S3, cache locally as well.
3. When you read from S3, check the cache first. Optionally cache locally on reads from S3.
4. Set cache sizes so you don't blow out local storage.
5. Tier your caches when needed to increase sharing. (Immutability makes this trivially safe.)
All that's left is to manage 'checked out files' which is pretty easy when almost all of them are immutable anyway.
I totally agree that we're continuing to see a trend of applications which are designed to work directly on S3.
However, like the S3 protocol, I think that the file protocol is cemented in time as something that we will be using 100 years from now. For example, most AI applications do still download data sets to local file system devices to actually load and use, this is why you see a lot of HPC workloads use things like Lustre. Postgres, SQLite, etc all use file system semantics to operate the database.
I totally respect folks who rewrite their applications to work directly with S3, but as you point out, it comes with a different set of challenges (around caching and chunking).
Thanks for the question. We stage writes to a durable, shared caching layer. This allows us to respond quickly to your application when it performs these operations (<1ms), but then asynchronously send those operations to S3 later. When connecting through Regatta, all file system clients see a strongly consistent read-after-write view of the changes on the file system, even if they haven't yet propagated to S3.
Hey there, thanks for the concern. There are a spectrum of teams out there. Some teams are totally comfortable building something like this and running their own storage infrastructure. Other teams want a fully managed solution to handle storage for them so that they can focus on building. I think it's great that we have a spectrum of products!
Snowflake and Databricks aren't storage products, but are managed compute platforms on top of storage that probably looks a lot like this. Snowflake allows you to easily connect different data sets to your data warehouse, and Databricks provides a managed analytics (Spark) offering.
Regatta, on the other hand, would allow you to more easily build the next Snowflake or Databricks by taking advantage of the same low-cost, unlimited storage in S3 that they likely use.
We are looking at launching in Azure Cloud with support for Azure Blob Storage as the backend within the next 6 months. If there's a specific use case that you have, it would be helpful to share it with me at hleath [at] regattastorage.com so we can appropriately prioritize Azure against other cloud vendors and regions.
Fascinating. If this had been around a year ago, we could have used it in our datacenter build-out. For data source reasons, we record data in the cloud. In the past, we'd stick most of the data in S3 and only egress what we needed to run analysis on. The way we'd do that is that we have a machine with 16 * 30 TiB SSDs that acts as our on-prem cache of our S3 data. It did this using a slightly modified goofys with a more modified catfs in front of it, with both the cache and the catfs view exported over NFSv4. We had application-level switching between the cache and the export since our data was really read-only.
When the cache got full, catfs would evict things from it pretty simply. It's overall got a good design but has a few bugs you have to fix, and when you have 100 machines connecting to it, it requires some tuning to make sure that it doesn't all stall. But it worked for the most part.
Anyway, I think this is cool tech. I'm currently doing some bioinformatics stuff that this might help with (each genome sequence is some 100 GiB compressed). I'll give it a shot some time in the next couple of months.
That's exactly the kind of thing that I've been hearing lots of teams having to solve individually, and I'm glad that this set up worked out for you. Would love to see you try it for bioinformatics (another industry where this problem seems to show up frequently), feel free to reach out with any questions when you start that.
I don't think so, I see them as complementary. MinIO is great when you have downstream applications which speak the S3 API that need acceleration of that data. Regatta is designed for applications which speak file semantics (think, application logging, storing corpuses of training data, or state) that doesn't run on the S3 API. Regatta actually supports MinIO as an S3-compatible backend for your file system!
I think it’s more analogous to Minio’s discontinued proxy mode. This is where you’d talk to minio locally (using whatever interface/protocol) and it would act as a local cage for S3 objects. If you wrote to it, it would propagate the changes up to S3 proper (or whomever using the S3 protocol).
I believe they stopped supporting that mode because they didn’t want to keep chasing every S3 protocol change. However, if you’re just using S3, and not trying to masquerade as S3, this problem becomes easier.
I think it's complementary as well, even more so after MinIO deprecating its Gateway and Filesystem modes a couple of years ago. MinIO is "S3 compatible" object storage, so technically, MinIO users should be able to use your product to have a file-system like experience on their buckets and objects, although you're using IAM and there might be a need either for your client to handle pure S3 credentials, either for a third-party plugin to your client to do that. It could be a good opportunity to piggyback on MinIO's userbase.
We had built an MLOps platform[0] a few years ago and enabled users to use their S3 buckets in a "file system like" manner. This made it possible for them not to have to know or write S3 specific code in their Jupyter notebooks as most people in the industry did with boto3, which also forced them to write code (say using TensorFlow) in a certain way for training to consume the files, err, objects. It was a mess, and we removed that for notebooks that could run the same way on a laptop or on the platform, even with the shell kernel so people could explore objects like files. MLFlow could work on a filesystem or on S3, but it had no authentication, so we built around that to know which user/experiment produced which artifact.
MinIO had a Gateway that was deprecated. We didn't use it much and they didn't have an admin client at the time, so I rolled one up to orchestrate the thing.
One way I did it that hook into users' compute and storage as opposed to offering storage/compute was for two reasons:
- Organizations already had their data somewhere with established policies. Getting them to move that data is very hard (CISO, CTO, IT, legal, engineers). Friction would have been huge.
- Organizations already had budgeted compute and storage, they may have had contracts/discounts/credits with cloud providers and it didn't make sense to ask them to make a decision on budgeting for another solution.
- A design principle of having the product being able to die without leaving the users scrambling to exfil/migrate data.
One way to do it was to handle FUSE, and your mileage may vary (s3fs-fuse, goofys, etc). Amazon has released Mountpoint last year[1], and one question you'll get asked is why use Regatta when I could use Mountpoint?
We are finding a lot of success in the ML Ops space for exactly this reason. I also completely agree that enterprise customers want to keep their data where they can govern and audit it (often in S3). We're excited about the possibility to allow folks to access and use that data while it stays in S3 for primary storage.
I agree around the questions with Mountpoint, and we're solving a very different set of problems than Mountpoint. Mountpoint, for example, isn't designed to be used with all file applications and lacks support for things like appends to existing files, random writes, renames, and symbolic links. On the other hand, Regatta supports POSIX semantics and can work with nearly all file based applications.
That’s so nice see, because in the few days I had been tinkering with the concept of file system + blob storage but I had hard time com up with use-cases other than an unlimited Dropbox where you own the storage and truly pay as you go.
I think that "owning the storage" is such an important part of this. I'm excited that folks who use this will continue to have access to their data directly through S3, so if they ever decide to move off of Regatta, all their data is still right there. This is also important at large companies which already have compliance and governance workflows that connect to data in S3 -- Regatta enables them to continue to use those workflows without having to think about another primary storage system.
These distributed storage systems solve very similar problems, depending on how you use them. Our target customers aren't looking to deploy their own infrastructure, so having a "single-click" option without having to think about how much capacity they need is very valuable.
People have been throwing out "POSIX" distributed file systems for a long time but this claim usually raises more questions than it answers. Especially since clients access it via NFSv3, which has extremely weak semantics and leaves most POSIX filesystem features unimplemented.
I think this is a great call out, and you're correct. One example that comes to mind is that NFSv3 doesn't support flags on the rename() operation (such as RENAME_WHITEOUT), which means that you can't use them as an overlay upperdir (which is desirable for building container runtimes). To solve this, we're working on a custom protocol that we intend to place in the Linux kernel which will expose a broader set of features than we can get in NFS. As I tell people, this is the worst version of Regatta that will ever exist -- we're going to make it better every day.
You can implement a single client NFSv3 server that provides stronger than expected (of NFSv3) guarantees and if you implement the "optional" companion protocols it should come closer to local filesystem semantics than most network filesystems. What would be neat about such a solution is that you can run the server either locally or remotely (same site, high bandwidth, low latency) and at the same time clients would have to a custom FUSE server or even worse load an (from the customer's point of view) experimental vendor kernel module. Upgrading from NFSv3 to NFSv4 would get you a bit closer to POSIX semantics, but of course it would still be NFS just not over a congested, jittery link to a shared server. Especially NFSv4 delegations could be a nice way to let the clients kernel buffer a lot of bursty async I/O locally. Just keep in mind how little POSIX really guarantees instead of assuming it will behave like ext4/XFS or even better ZFS on a laptop NVMe with two levels of power loss protection (big caps in the drive and the laptop battery).
I think this is exactly right, but there are lots of people who don't want to manage their own NFS servers -- that's who we're targeting with Regatta. Notably, I think that v4 delegations gets you close but not close enough to the performance that we're looking for. For example, you can't get a delegation for a directory (which means that you're still doing round trips for CREATE and UNLINK), which seems to be the case even with "nocto". But, I need to spend more time playing around with that.
I dunno if this is considered off-topic, since it's commentary about the website, but that's twice in the past week I've seen a launch website that must have used a template or something because almost all the links in the footer are href="#". If you don't have Careers, Privacy Policy, Terms, or an opinion about Cookies, then just nuke those links
This is a managed cloud service. If you're interested in using Regatta on-premises, I'd love to hear from you -- shoot me some mail at hleath [at] regattastorage.com
I used the same approach based on Rclone for a long time. I wondered what makes Regatta Storage different than Rclone. Here is the answer: "When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients." [0].
Rclone, on the contrary, has no layer that would guarantee consistency among parallel clients.
[0] https://docs.regattastorage.com/details/architecture#overvie...
The headline seems misleading, then.
rclone can work with AWS' different offerings, some of which at least partially address this: https://aws.amazon.com/blogs/aws/new-amazon-s3-express-one-z...
I'm not totally sure what you mean. I don't think that S3 Express One Zone offers any additional atomic semantics in the file system world.
This is exactly right, and something that we think is particularly important for applications that care about data consistency. Often times, we see that customers want to be able to quickly hand off tasks from one instance to another which can be incredibly complex if you don't have guarantees that your new operations will be seen by the second instance!
Might be useful to show the differences with Rclone, s3fs as a table to make it obvious
I agree, I plan to put up a table soon.
Thanks, this was my thought as well. I use and love rclone and it wasn't immediately clear what this offered above that
This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!
1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?
2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?
3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?
4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?
5. I have to ask - how do you think about open source here?
6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)
I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.
Wow, thanks for the nice note! No questions are silly, and I'll also note that we now have a docs site (https://docs.regattastorage.com) and feel free to email me (hleath [at] regattastorage.com) if I don't fully address your questions.
> If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?
We don't actually do caching on your instance's disk. Instead, data is cached in the Linux page cache (in memory) like a regular hard drive, and Regatta provides a durable, shared cache that automatically expands with the working set size of your application. For example, if you were trying to work with data in the 50 GiB range, Regatta would automatically cache all 50 GiB -- allowing you to access it with sub-millisecond latency.
> Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?
For now, yes -- the speed is highly dependent on latency -- which is highly dependent on distance between your instance and Regatta. Today, we are only in AWS, but we are looking to launch in other clouds by the end of the year. Shoot me an email if there's somewhere specifically that you're interested in.
> I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?
There are a couple of different questions bundled together in this. Today, Regatta exposes an NFSv3 file system that you can mount. We are working on a new protocol which will be mounted via FUSE. However, in Docker environments, we also provide a CSI driver (for use with K8s) and a Docker volume plugin (for use with just Docker) that handles the mounting for you. We haven't released these publicly yet, so shoot me an email if you want early access.
> Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?
Yes, you should be able to run a database on Regatta.
> I have to ask - how do you think about open source here?
We are in the process of open sourcing all of the client code (CSI driver, mount helper, FUSE), but we don't have plans currently to open source the server code. We see the value of Regatta in managing the infrastructure so you don't have to, and if we release it via open-source, it would be difficult to run on your own.
> Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)
Yes, you can mount on multiple servers simultaneously! We haven't specifically stress-tested the number of clients we support, but we should be good for O(100s) of mounts. Unfortunately, AWS locks down Lambda so we can't mount arbitrary file systems in that environment specifically.
> efs performance
Yes, the challenge here is specifically around the semantics of NFS itself and the latency of the EFS service. We think we have a path to solving both of these in the next month or two.
Thank you for the detailed answers! Honestly, this project inspires me to work on infrastructure problems.
So you are saying that regatta's own SaaS infrastructure provides the disk caching layer. So you all make sure the pipe between my AWS instance and your servers are very fast and "infinitely scalable", and then the sync to S3 happens after the fact.
That's exactly right!
Do I understand correctly that the data gets decrypted at your Regatta AWS instances, before the data ends up in the customer's S3 bucket? It sounds like the SSL pipe used for NFS is terminated at Regatta servers. Can customers run the Regatta service on their own hardware?
Or does Regatta only have access to filesystem metadata -- enough to do POSIX stuffs like locks, mv, rm -- but the file contents themselves remain encrypted end-to-end?
This is correct, we encrypt data in-transit to the Regatta servers (using TLS), and we encrypt any data that the Regatta servers are storing. Of course, when Regatta communicates with S3, that's also encrypted with TLS (just like using the AWS SDK). However, we don't pass the encrypted data to S3, otherwise you wouldn't be able to read it from the bucket directly and use it in other applications!
Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?
Service Tier: Zonal
Location: us-central1
10 TiB instance at $0.35/TiB/hr
Monthly cost: $2,560.00
Performance Estimate:
Read IOPS: 92,000
Write IOPS: 26,000
Read Throughput: 2,600 MiB/s
Write Throughput: 880 MiB/s
0 - https://www.definite.app/blog/duckdb-datawarehouse
Yes, you should be in our target market. I don't think that I can give a cost estimate without having a good sense of what percentage of your data you're actively using at any given time, but we should absolutely support the performance numbers that you're talking about. I'd love to chat more in detail, feel free to send me a note at hleath [at] regattastorage.com.
I'll send you a note!
Found this in the docs:
> By default, Regatta file systems can provide up to 10 Gbps of throughput and 10,000 IOPS across all connected clients.
Is that the lower bound? The 50 TiB filestore instance has 104 Gbps read through put (albeit at a relatively high price point).
That's just the limit that we apply to new file systems. We should be able to support your 104 Gbps of read throughput.
Out of curiosity, why not go bare metal in a managed colocation? Is that for the geographic spread? Or unpredictable load?
Every few months of this spend is like buying a server
Edit: back at my pc and checked, relevant bare metal is ~$500/m, amortized:
https://baremetalsavings.com/c/LtxKMNj
Edit 2: for 100tb..
agreed, one month of 50 TiB is $12,800!
we're using Filestore out of convenience right now, but actively exploring alternatives.
Hiring someone who knows how to manage bare metal (with failover and stuff) may take time %)
You pay a datacenter to put it in a rack and add connect power and uplinks, then treat it like a big ec2 instance (minus the built-in firewall). Now you just need someone who knows how to secure an ec2 instance and run your preferred software there (with failover and stuff).
If you run a single-digit number of servers and replace them every 5 years you will probably never get a hardware failure. If you're unlucky and it still happens get someone to diagnose what's wrong, ship replacement parts to the data center and pay their tech to install them in your server.
Bare metal at scale is difficult. A small number of bare metal servers is easy. If your needs are average enough you can even just rent them so you don't have capital costs and aren't responsible for fixing hardware issues.
If this product is successful, what prevents AWS from cloning it at a lower price (perhaps by leveraging access to their infrastructure) and putting you out of business?
I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.
Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.
Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?
Hey -- I think this is something that's in-scope for our custom protocol that we're working on. I'd love to chat more about your needs to make sure that we build something that will work great for you. Would you mind shooting an email to hleath [at] regattastorage.com and we can chat more?
Wow, looks like a great product! That's a great idea to use NFS as the protocol. I honestly hadn't thought of that.
Perfect.
For IBM, I wrote a crypto filesystem that works similarly in concept, except it was a kernel filesystem. We crypto split the blocks up into 4 parts, stored into cache. A background daemon listened to events and sync'ed blocks to S3 orchestrated with a shared journal.
It's pure magic when you mount a filesystem on clean machine and all your data is "just there."
> It's pure magic when you mount a filesystem on clean machine and all your data is "just there."
I totally agree! I am hoping that Regatta can power a future where teams don't need more than ~8 GiB of local storage for their operating system, and can store the rest on something like Regatta to get rid of the waste of overprovisioned block volumes.
That would sell like hot cakes to the public sector.
Let's hope so, I'd love to help teams take storage infrastructure management off of their plate! If you're in the public sector and interested in trying out Regatta, please shoot me an email at hleath [at] regattastorage.com.
In (March?) 2007 (correction 2008) myself and two other engineers in front of Bruce Chizen - Adobe's CEO in a small conference room in Bucharest demoed a photo taken with an iPhone automagically showing as a file on a Mac. I implemented the local FUSE talking to Ozzy - Adobe's distributed object store back then, using an equivalent of a Linux inode structure. It worked like a charm and if I remember correctly it took us a few days to build it. It was a success just as much as Adobe's later choices around http://Photoshop.com were a huge failure. A few months later Dropbox launched.
That kickstarted about a decade in (actual) research and development led by my team which positioned the Bucharest center as one of the most prolific centers in distributed systems within Adobe and of Adobe within Romania.
But I didn't come up with the concept, it was Richard Jones that inspired us with the Gmail drive that used FUSE with gmail attachments back in 2004 when I got my first while still in college https://en.wikipedia.org/wiki/GMail_Drive. I guess I'm old, but I find it funny to see Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS
The funny thing about storage is that all of the problems are the same! Ultimately, there is no problem that cannot be solved with caching, journaling, write-ahead logging, etc. I think what makes the problem space so interesting is how a million different products can make a million different trade offs with these tools to deliver on their customer needs. File systems are awesome.
> The funny thing about storage is that all of the problems are the same!
they are all the same and they are all more than what would at the surface seem that it's "just files" the whole OS, especially Linux/UNIX is "just files" and if you look deeper at databases you can see how it boils down to the file formats (something that was visible with LevelDB but maybe less so with RocksDB, I guess)
Does it mean I can use Lambda + SQLite + Regatta to build a real pay-as-you-go ACID SQL storage?
Edit: an production-ready (high durability) ACID SQL storage
Yes! This is my expectation. Lots of the big companies have already done this with in-house architecture. With Regatta, we want to democratize building stateless applications that can take advantage of the low-cost storage of S3.
That's some real tech in YC these days!
Curious as to why you would want to build that yourself when so many solutions already exist (Supabase, NeonDB, AWS Aurora or RDS, etc.)?
One of my hopes for Regatta is that we're able to power the next generation of these data platforms. These things work because the designers had specialized storage knowledge that allowed them to carefully build serverless data products. I hope that Regatta is generic enough to allow anyone to build a serverless data product moving forward, without having to think about their storage infrastructure.
That makes a lot of sense. If you eliminate the need for storage expertise the problem becomes a lot easier!
BTW I sent you an email.
This looks quite compelling.
But it's not clear how it handles file update conflicts. For example: if User A updates File X on one computer, and User B updates File X on another computer, what does the final file look like in S3?
Hey there, our file system is strongly consistent for all connected file system clients. For example, if User A and User B are both connected via Regatta, then this works like any other NFS file system (in that they can use file locks, atomic renames or other techniques to ensure that one write wins). However, if User A and User B are accessing the data through different protocols (for example User A is using Regatta and User B is accessing the data through S3), then it's possible to get undefined behavior by attempting to simultaneously update the same piece of data from both places. We think that these applications are rare, and (almost by definition) likely don't exist right now. For the most part, customers use file storage as a "stage" in a broader workflow (for example, customers may ingest data through S3 and then process it on a file system), and that is totally consistent.
There are quite some noteworthy alternatives like s3fs, rclone, goofys etc.
This is accurate! A lot of people have spent a lot of time trying to build a good file system abstraction on cheap, S3 storage. However, Regatta differs from these solutions in two important ways. First, Regatta is a shared, durable caching layer that sits between your instances and S3. This means that Regatta is able to efficiently perform operations (like directory renames) and provide strong consistency to other file system clients (whereas s3fs and other FUSE file systems would need to actually perform those operations in S3 for other clients to see the output). Secondly, Regatta is designed to support all file system operations. This means that you can do file locking, random writes, appends, and renames -- even when they aren't efficient to perform on S3.
Super interesting product. I have a couple of questions:
In terms of storing in s3 - is that in your buckets? Sound like the plan is to run the caching on your infrastructure, are there plans to allow customers to run those instances themselves?
Presumably the format within s3 is your own bespoke format? What does the migration strategy look like for people looking to move into or out of your infrastructure? They effectively pull everything down from their s3 to the local “filesystem”?
I love this because it allows me to highlight the parts of the system that I'm most excited about. The Regatta caching runs on our infrastructure, but it connects to buckets that our customers control. We read and write data into the customer's bucket in a regular, native (not bespoke) format -- so you can connect a Regatta file system directly to a bucket that already exists, with data in it, and use that data from a file system without any data migration!
Oh interesting! So you map exactly to the structure in s3? It’s like fuse backed by s3 with good performance?
That's exactly right -- I like to think that we deliver on the promise of those open-source S3 adapters. We provide enterprise-grade performance.
Can you comment on how this is different from https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-ge... ?
Sure can, full disclosure, copied from a comment below:
Thanks for the question! Mountpoint for Amazon S3 is a FUSE layer that doesn't support full POSIX semantics. For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames. This means that you have to carefully instrument your application to understand whether or not it's compatible with Mountpoint, which can be error-prone. Regatta, on the other hand, provides full POSIX compatibility for the file interface, which means that it works out-of-the-box with all file based applications.
Pretty cool. I'm excited about databases using this. Feels like Neon's PostgreSQL storage, but generalized to an FS.
Is this like FUSE with a cache? How does cache invalidation work?
All the best!
Yeah, I like to think of it in a similar vein. We want to empower people to create stateless workflows where they may have previously needed to think about state management. Today, Regatta is an NFS file system where the cache lives on our shared infrastructure. However, when we complete the work on our custom protocol, that will be a FUSE file system which offers additional caching on your instances to enable truly local-like performance.
Neat stuff. I think everybody with an interest in NFS has toyed with this idea at some point.
> Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.
How do you handle the cache server crashing before syncing to S3? Do the cache servers have local disk as well?
Ditto for how to handle intermittent S3 availability issues?
What are the fsync guarantees for file append operations and directories?
Thanks for the question!
> How do you handle the cache server crashing before syncing to S3? Do the cache servers have local disk as well?
Our caching layer is highly durable, which is (in my opinion) the key for doing this kind of staging. This means that once a write is complete to Regatta, we guarantee that it will eventually complete on S3.
For this reason, server crashes and intermittent S3 availability issues are not a problem because we have the writes stored safely.
> What are the fsync guarantees for file append operations and directories?
We have strong, read-after-write consistency for all connected file system clients -- including for operations which aren't possible to perform on S3 efficiently (such as renames, appends, etc). We asynchronously push those writes to S3, so there may be a few minutes before you can access them directly from the bucket. But, during this time, the file system interface will always reflect the up-to-date view.
So, I assume you use a journal in the cache server.
A few related questions:
* Do you use a single leader for a specific file system, or do you have a cluster solution with consensus to enable scaling/redundancy?
* How do you guarantee read-after-write consistency? Do you stream the journal to all clients and wait for them to ack before the write finishes? Or at least wait for everyone to ack the latest revisions for files, while the content is streamed out separately/requested on demand?
* If the above is true, I assume this is strictly viable for single-DC usage due to latency? Do you support different mount options for different consistency guarantees?
These are questions that are super specific to our implementation, that I'm hesitant to share publicly because they could change any at any time. I can share that we're designed to horizontally scale the performance of each file system, and our custom protocol will enable Lustre-like scale out performance. As for single- vs. multi-DC, I think that you'd be surprised at how much latency budget there is (a cross-DC round trip in AWS can be anywhere from 200us-700us, and EBS gp3 latencies are around 1000us).
Is it fair to say this is best suited for small files that will be written infrequently?
There’s no partial write for s3 so editing a small range of a 1 GiB file would repeatedly upload the full file to the backing s3 right?
Or is the s3 representation not the same hierarchy as the presented mount point? (ie something opaque like a log structured / append only chunked list)
It's hard to define "best", and in many cases, the answers to these questions depend heavily on the workload and the caching parameters (how long do we wait before flushing to S3, etc). We are designed to provide good file system performance, even if customers are repeatedly writing small pieces of data to a 1 GiB file, so "best" in this case is a question of whether or not it's cost efficient.
Congrats on the launch, this is really cool! Is the durable cache an attached disk, or are you using a separate AWS product for that?
Without getting too much into the details of the system, our durable cache is designed for 5 9s of durability (and we're working on a version that will provide 11 9s of durability soon). You can't achieve those durability numbers on a single attached NVMe device without some kind of replication.
> NFSv3 (soon, our custom protocol).
definitely the thing I want to hear more about. Also, I can't help shake the "what's the catch, how is no one else doing this, or are they doing it quietly?" feeling.
Trust me, I feel the same way. The problem with these things is that you end up building a company because you get so much conviction that what you're doing is the right thing for customers, and you end up shocked that this isn't the default for everyone.
I am not your target audience but I have been thinking of building a very minified version of this using [0] Pooch and [1] S3FS.
Right now we spend a lot of time downloading various stuff from HTTP or S3 links and then figuring out folder structures to keep them in our S3 buckets. Pooch really simplifies the caching for this by having a deterministic path on your local storage for downloaded files, but has no S3 backend.
So a combination of 2 would be to just have 1 call to a link that would embed the caching both locally and on our S3 buckets deterministically.
[0] https://www.fatiando.org/pooch/latest/ [1] https://s3fs.readthedocs.io/en/latest/
I think this is a great insight, and something that I think about often. The challenge that I see is that the scientist archetype (whether it's data science, AI researcher, or anything else) isn't really interested in doing software development for these kinds of things. They just want the data to be there, and it's super nice to be able to click through the S3 console to be able to see and share the data their using. I think that what you're doing is a great idea for folks who are accessing their data primarily through Python programs!
Love this idea! Biggest hurdle though have been to have predictable Auth&IO across multiple Python/Scala versions and all other things (Spark, orchestrators, CLI's of teams of varying types of OS etc etc) add to that access logs.
SF3s/boto/botocore versions x Scala/Spark x parquet x iceberg x k8s etc readers own assumptions makes reading from S3 alone a maintenance and compatibility nightmare.
Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?
If so then you will win 100% I wouldn't even care about speed/cost if it's up to par with s3
Yeah, that's exactly right. I had some... experiences with Spark recently, that convinced me that this is something that could really help. I also really like the idea that organizations can continue to use S3 as the source of truth for their data (as you mention, it means that you can continue to use Access Logs, which would capture all usage of your S3 bucket across your applications).
> Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?
Ha, well it depends on what you mean by surprises. We won't have a Python-specific file system. Our client is going to come in two flavors. Today, you can mount Regatta over NFSv3 (which we wrap in TLS to make it secure). This works for some workloads, but doesn't provide like-for-like performance with EBS. Over the next month, we plan to release the "custom protocol" that I wrote about above, that we expect to send to customers in the form of a FUSE file system.
Either way, it should be one package, you shouldn't need to worry about versioning, and it will appear as a real, local file system. :D
The title says POSIX but then it talks about NFS. So, what is it? Does it guarantee all POSIX semantics or not?
You are correct in that NFS is not strictly-speaking POSIX compliant to the letter of the law, due to the caching behavior. This is an NFSv3 file system, so it shares those semantics. The point that I'm trying to emphasize is that the file system supports standard file operations which aren't possible through other FUSE adapters, or possible to perform efficiently on S3 (such as append, rename, and symbolic links) -- which provides broad compatibility with file-based applications.
Which is nice and useful of course but there is ton of things that can't reliably be done with that (like running any database you that comes to mind) which makes it important to be precise here.
Is there something specific that you worry about when running a database on a networked file system? I would imagine that any database which is correctly fsync'ing the data to the write-ahead-log should work just fine.
Is this like JuiceFS? https://juicefs.com/
It's similar to JuiceFS, but JuiceFS writes and reads data from S3 in a proprietary block format. This means that you cannot connect JuiceFS to existing data sets in S3, and you cannot use data written through JuiceFS from the S3 API directly. On the other hand, Regatta reads and writes data to S3 using it's native format -- so you can do these things!
I don't see any other question about it, so maybe I just missed the obvious answer, but how do you handle POSIX ACLs? If the data is stored as an object in S3, but exposed via filesystem, where are you keeping (if at all?) the filesystem ACLs and metadata?
Also, NFSv3 and not 4?
Great call out. Some kinds of data, like ACLs and specific kinds of metadata, don't live in S3. Full disclosure, we don't support ACLs today (but plan to soon). We keep file system metadata in the durable cache. For some files (where users haven't changed permissions, etc), we are able to release that cached metadata when the file is no longer in use. For other files (where permissions have been changed by the user), that metadata must live in the cache long-term.
We selected NFSv3 due to it's broad compatibility with different compute environments. For example, Windows has an NFSv3 client in it, but doesn't have an NFSv4 client. There are lots of enterprise workloads which needs simultaneous access to file data from both Windows and Linux, and supporting NFSv3 was the easiest path to support those workloads.
Do you pay for metadata accesses? Does running a `find` across the filesystem cost anything? What about system calls that don't transfer data? Can I move or rename a file without paying to copy and then delete the associated S3 object?
Today, we only charge for cache usage (storage) and data transfer between Regatta and S3. If your metadata access doesn't require transfer to S3, then it doesn't cost anything! However, renames do require transfer to S3 (because we have to move the object on the backend).
Thanks, I keep hoping someone comes up with some magic :)
Is the intent to run this in-vpc?
And how do you differentiate from AWS Storage Gateway?
I'd love to hear more about what you're excited to do when the magic arrives. :D
We are running it as a managed SaaS, so our customers connect to the caching layer that runs in the Regatta VPC. This allows us to manage the infrastructure for them and keep costs low.
Storage Gateway is an interesting product, and I worked closely with that team for several years -- so mad respect for them. It was designed to be an appliance that you run on servers in your own data center (of course, many customers now deploy it to EC2). Because of this, it's designed to operate in an environment with "finite storage" -- for example, different workload pattterns can thrash the cache, which results in poor performance to clients, and it's not designed to run in a high-availability cluster in the cloud. Regatta solves these problems with durable cache storage that's safe to data in long-term, and is designed for high-availability.
Super interesting project. But I cannot understand why you support only EC2 instances as clients. For what it is worth, it looks strange and limiting. By default I expect to be able to use Regatta Storage from everywhere: from my local machine, from my Docker containers running elsewhere, etc.
This isn't a technical limitation, per se, but a time limitation in terms of getting to the place where we feel comfortable supporting those environments for the public. I still wouldn't recommend mounting it from a local environment (because NFS behaves pretty poorly when it can't connect to the server), but we do have a CSI driver for containers running in K8s. We expect that customers will get the best experience if their instances are very close (latency-wise) to our instances, which is why we only support access from us-east-1 in AWS. We expect to launch in more regions and clouds in the coming months.
If you want early access to other clouds or the CSI driver, feel free to email hleath [at] regattastorage.com.
If using EFS already, how would the pricing / performance compare? Or is that maybe not a use case for regatta storage?
It depends on what you're doing with EFS! For the most part, I would expect to be lower cost than EFS. If you're doing where individual files are primarily written or accessed from an individual instance, I would expect a significant improvement in performance. If you have some time, I'd love to chat more deeply about what you're doing. Feel free to grab some time on my calendar from the Demo link on the Regatta home page or shoot me an email at hleath [at] regattastorage.com.
Reminds me of https://www.lucidlink.com/ for video editors. I quite like the experience with them.
That's exactly right, I've spoken with a ton of folks who have had a good experience with Lucid Link. I think that we are in a slightly different part of the market (in that we aren't targeting video editors, and more of data-intensive applications which may use thousands of IOPS), but I appreciate that the technology is likely similar.
How does this compare to Amazon's own offering in this space, the "AWS Storage Gateway"? It can also back various storage protocols with S3, using SSDs for cache, etc. (https://aws.amazon.com/storagegateway/features/)
Great question! We fill the same role as AWS Storage Gateway (and I used to work closely with that team when I was at AWS, lots of respect for what they do). AWS Storage Gateway is built primarily as an appliance to be installed on instances in your own data center to ease migration to the cloud. Many customers do deploy Storage Gateway on EC2 because they want these features in the cloud itself. However, the "appliance" design of Storage Gateway makes it unsuitable for this purpose. For example, Storage Gateway is not designed to run in a cluster for high-availability and doesn't have access to durable, long-term storage to stage and cache writes.
On the other hand, Regatta is designed as a cloud-native gateway product. Regatta's elastic, durable caching layer allows us to efficiently cache large data sets without thrashing, and always efficiently perform writes. Because Regatta is designed to be highly-available, customers don't have to worry about downtime for patching or deployments.
S3 File Gateway sounds a lot like your product.
Also true! If you look at their site, they're really targeting folks to deploy it into their data centers to provide on-premises caching of resources in AWS, rather than providing a high-speed cache within AWS for file-based applications.
https://aws.amazon.com/storagegateway/file/s3/
Wondering what the difference is between this and juicefs?
Great question! Full disclosure, answer copied from a another comment:
It's similar to JuiceFS, but JuiceFS writes and reads data from S3 in a proprietary block format. This means that you cannot connect JuiceFS to existing data sets in S3, and you cannot use data written through JuiceFS from the S3 API directly. On the other hand, Regatta reads and writes data to S3 using it's native format -- so you can do these things!
> Currently, only the us-east-1 region is supported. Please contact support@regattastorage.com if you need to use a different region.
Bold choice, given what I know about us-east-1
:sunglasses: We think it's important to be where our customers are, and we're looking to prioritize the next regions that we launch in based on customer demand. We expect to be in more regions by the end of the year.
Total curiosity, but what’s the limiting factor of scaling out to multiple regions day one.
Time! We don't have a lot of people right now, so every minute that we spend launching infrastructure (especially in non-AWS clouds) is a minute that we can't spend on performance improvements for our customers.
I have a few qualms with this app:
1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
... I'm kidding, this is quite useful.
I really wish that NFSv3 and Linux had built-in file hashing ioctls that could delegate some of this expensive work to the backend as it would make it much easier to use something like this as a backup accelerator.
Ha, thank you for the FTP comment, I was hoping someone would make it.
> I really wish that NFSv3 and Linux had built-in file hashing ioctls that could delegate some of this expensive work to the backend as it would make it much easier to use something like this as a backup accelerator.
Tell me a bit more about what you mean here. We're interested in really pushing the limits of what a storage system can do, so I'd be potentially interested.
I rejected EFS as a common caching and shared files layer, despite being technologically an excellent fit for my stack, because it is astronomically expensive. The value created didn’t match the cost.
When I got in touch about that, I was confronted with a wall of TCO papers, which tells me the product managers evidently believe their target segment to be Gartner-following corporate drones. This was a further deterrent.
We threw that idea away and used memcached instead, with common static files in a package in S3.
I guess I’m suggesting, don’t be like EFS when it comes to pricing or reaching customers.
It's certainly my hope to be cost effective, but I understand the worry and I'm sorry that you had that experience with the PMs of that time. At the end of the day, I see my target customers as those who aren't interested in running their own infrastructure and having to manage availability and durability (in memcached case, things like needing to pre-warm the cache). I understand that it still may be possible to be more cost effective if you're willing to trade off ease of use for dealing with those other concerns.
oh interesting, I'd love to mount this to Finder on Mac, and load a bunch of massive bioinformatics databases on there and treat it like another folder
I'm also using Cloudflare R2 (S3 compatible) and would love for that to work out of the box
I know a lot of folks have asked me for local support, and while I can share that this would work from OS X -- it's not something that I would recommend doing outside of a data center because the semantics of a networked file system on a sporadic internet connection (when compared to a data center) aren't great -- unless you're doing something higher level like Dropbox. However, it's something we're considering for next year.
You can use rclone mount, depends on how much you're flipping through files or actually doing lots of IO
I wouldn't want to host fastqs or something and use this for alignment, but for spot checking raw fastqs it could be nice
This reminds me on using rclone mount on Terrabytes of data and I mostly wanted some "smaller" files between 200kb-1.5MB in a single directory. I made rclone mount significantly faster when rclone mount caches into a Ramdisk (there is a free tool to make Ramdisks on macOS too).
Similar to objectiveFS - we use this in production for email sync between multiple postfix servers and dovecot. Is this a supported use case?
There isn't any reason that it shouldn't be a supported use case, depending on your exact performance needs and workflow. It's very similar to ObjectiveFS except that it operates on the data in your S3 bucket in it's native format, so you can point it at existing data sets, and use the newly written data directly from S3.
Can you elaborate on a few things with regards to your pricing:
* What does "$0.05 / gigabyte transferred" mean exactly. Transferred outside of AWS or accessed as in read and written data?
* "$0.20/GiB-mo of high-speed cache" – how is the high-speed cache amount computed?
Sure, and we have more details on pricing here which may answer your questions: https://docs.regattastorage.com/details/pricing
We need to update the home page with these details, but $0.05 is only charged on transfer between Regatta and S3. We calculate your cache usage minutely and tally it into a monthly usage amount that we then bill for.
Thanks for clearing that up. Few followup questions:
You don't actually directly charge for storage itself, so I assume this a "bring your own s3 bucket" type of deal, correct?
How long does data, that is no longer being accessed sit in the cache and count towards billing?
As for availability, are you in the process or do you have plans to also support Google Cloud?
> You don't actually directly charge for storage itself, so I assume this a "bring your own s3 bucket" type of deal, correct?
That's correct -- we store data in the customer S3 bucket.
> How long does data, that is no longer being accessed sit in the cache and count towards billing?
We keep data in the cache for up to 1 hour after you've stopped accessing it.
> As for availability, are you in the process or do you have plans to also support Google Cloud?
We have plans to support Google Cloud. If you're interested in using us from GCP, I'd recommend setting up some time to chat (either use the website or email me at hleath [at] regattastorage.com). We are prioritizing where we launch our infrastructure next based on customer demand.
I might just take you up on that.
I know for a while Fuse was considered a security nightmare. My own org banned the use of it. Have things gotten better?
Huh, that's interesting. I wouldn't imagine that there were security problems specific to FUSE compared to any other software that you would run on your servers. Regardless, I see FUSE as the fastest path to getting our protcol in the hands of our customers. In the fullness of time, I hope that we can deliver it as either a kernel-module or in-tree.
Just want to say this is super cool. I'm excited to see what people build on top of it.. seems like it could enable a new category of hosted data platforms-as-a-service (platform-as-a-services?).
This is more or less exactly what I'm hoping for. I think that people are excited to build stateless applications, but often that requires really specialized application and storage knowledge to pull off. My hope is that people can use this generic storage layer to build the next generation of stateless applications (including things like databases) without having to become storage experts themselves. I'm also excited to see what they build.
How does this compare to the log structured virtual disk concept from this paper? It sounds quite similar at a glance.
https://dl.acm.org/doi/10.1145/3492321.3524271
One of the fun parts about working on storage and file systems in particular, is that these techniques are old as time. Log-structured writes, journals, caching, etc -- are all non-novel. However, the benefit to our customers is in how easy we make it for them to use something like this without having to deploy or build it themselves.
Interesting. Reminds me of FlexFS (https://flexfs.io/). I spoke to a very knowledgeable person there when investigating what to use but we ended up using EFS instead.
An annoying feature of EFS is how it scales with amount of storage, so when its empty its very slow. We also started hitting its limits so could not scale our compute workers. Both can be solved by paying for the elastic iops but that is VERY expensive.
FlexFS kicks ass. I benchmarked it for our data storage and processing layers in value.space (satellite data processing and analysis) and we will most likely migrate to FlexFS in the near future.
Out of curiosity, why did you choose EFS, it's insanely expensive at even modest scales?
Yes, I think it's similar product, but we're looking to provide high performance on all dimensions (latency, throughput, and IOPS). I totally agree with you that Elastic Throughput solves this problem, but it can be expensive for many workloads!
Why are you guys hijacking the scroll bar on your website?
Just the theme that we ended up using for the marketing site. We will likely build something less janky post-batch, but right now -- just trying to get the information out there.
This feels, intuitively, like it would be very hard to make crash consistent (given the durable caching layer in between the client and S3). How are you approaching that?
It depends on what you mean by crash-consistent. I would expect that we handle crash-consistency at the client fine (since it is the same crash-consistency of NFSv3) and craash-consistency at the server also fine (since we are able to detect using etags what version of an object is in the backing data storage). Tell me a bit more about what you're thinking.
For sure! Upon reflection, maybe I’m less curious about crash consistency (corruption or whatever) per-se, and more about what kinds of durability guarantees I can expect in the presence of a crash.
I’m specifically interested in how you’re handling synchronization between the NFS layer and S3 wrt fsync. The description says that data is “asynchronously” written back out to S3. That implies to me that it’s possible for something like this to happen:
1. I write to a file and fsync it
2. Your NFS layer makes the file durable and returns
3. Your NFS layer crashes (oh no, the intern merged some bad terraform!) before it writes back to S3
4. I go to read the file from S3… and it’s not there!
Is that possible? IE is the only way to get a consistent view of the data by reading “through” the nfs layer, even if I fsync?
So, the step that differs from your concern is Step 3. Let's say that we have a catastrophic availability scenario (as you said, intern comes in and tears down something) -- our job is to make sure that the data in our durable cache remains there (and to put safeguards in place to prevent the intern from hitting that data). If we do that, then any crash of our system will get the data back and be able to apply it to S3. I know that's kind of hand-wavy, but this is how things like AWS S3 work -- just having a super high bar for processes around operations to keep data safe.
For some reason, I don't see a "reply" button to your later comment (maybe there's an HN threading limit), but the answer is yes -- fsync guarantees durability in the Regatta durable cache, not in S3.
Gotcha! Thanks for the answer; so the tl;dr is, if I’m understanding:
“All fsync-ed writes will eventually make it to S3, but fsync successfully returning only guarantees that writes are durable in our NFS caching layer, not in the S3 layer”?
How does this differ from rclone mount and its vfs/caching system, possibly combined with mergerfs or rclone union for cache tiering?
Yes, you can absolutely get similar functionality with rclone. However, what we are solving for our customers is the ability to do this without thinking about infrastructure or deployments. Customers don't need to worry about data durability, replication, recovering off of failed drives, or availability through deployments or patches.
Congrats on the launch!
Could a Regatta filesystem offer any advantage over ClickHouse's built-in S3 and local disk caching features in terms of cost or performance?
It can offer an advantage over the built-in caching, but it depends on your exact access patterns. For example, if you are running ClickHouse on multiple servers and accessing the same reference data, it's more efficient to cache that data in a centralized location (like Regatta) instead of on the disk of each individual instance.
Philosophically, our goal is to build a standard that can be used in these kinds of applications moving forward, so that application developers don't need to build streaming over and over again and users don't need to learn how to configure each individual systems' caching.
How does this compare to https://github.com/awslabs/mountpoint-s3 ?
Thanks for the question! Mountpoint for Amazon S3 is a FUSE layer that doesn't support full POSIX semantics. For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames. This means that you have to carefully instrument your application to understand whether or not it's compatible with Mountpoint, which can be error-prone. Regatta, on the other hand, provides full POSIX compatibility for the file interface, which means that it works out-of-the-box with all file based applications.
> For example, you can't use Mountpoint for Amazon S3 for random writes to existing files, appends, or renames.
Can you support these operations with the expected semantics and performance?
If the application makes a one-byte change to a giant file and calls fdatasync, what happens? Do you re-upload the entire file to S3?
How do you handle a rename? Applications commonly do this for atomic replacement on POSIX and expect three properties from this operation:
* fast. * destination always points to either the original or new afterward (on success or failure); no scenario at which it's lost/truncated. * no extra storage used (on success or failure).
Do you guarantee any of those? How? I don't see an obvious way from the S3 HTTP API.
Given that POSIX API doesn't support things like arbitrary per-operation deadlines/timeouts, do you think it's suitable as a distributed filesystem API at all? Why?
The tl;dr of this is -- yes. We have a durable caching layer that we use to stage writes before we asynchronously replicate them to S3. This means that we are able to quickly (<1ms) perform operations like single-byte updates and renames and provide strong read-after-write consistency to other file system clients.
Once the operation is stored in our durable cache, then we update your S3 bucket to match what the file system expects. This generally takes around a minute, but could take longer depending on the number of S3 operations a file operation translates to (for example, a directory rename requires that CopyObject each object in the directory in S3).
I think that the POSIX API is to here to stay (like the S3 API). I agree that it would be better to have timeouts and deadlines, but I don't think that those make it impossible to provide a good distributed file system experience on POSIX (look at Amazon's EFS, Oracle's FSS, Google's FileStore, etc). It just makes the bar for availability higher.
Does Regatta require a local disk sized for the entire file to support random writes? One problem I’ve seen is that we have set up instances with a modest local disk but then work with files for which we need to pull the whole file into a local cache modify some parts and then push the full result back into s3. It would be helpful to have a way to work with s3 as though it were posix without having to match the local disk size to the largest file we might need to process.
This is exactly the problem that we solve! You don't need any local disk on your EC2 instance in order to use Regatta or work with data in S3. Our high-speed caching layer plays the role as this local disk for you, so that you can work with data sets that are hundreds of TiBs, even if you only have a 20 GiB EBS volume on your instance.
What is the acceptable latency , if we have to use this outside of Ec2 , lets say mounting S3 from on-prem/GCP/Azure ?
Well, in my opinion, I want to deliver the lowest latency possible. I expect that we will have Regatta running in GCP and Azure within the next 6 months. I'd love to connect if there's a place on-prem that you're looking to use Regatta. Would you shoot an email to hleath [at] regattastorage.com, and we could chat about what you're looking for?
Careers link points to index page :)
Sorry about that! It's on our list to fix once we're done responding to comments.
I know that Amazon in general has large ingress and egress cost how much overhead will this application incur?
Those costs only apply to data transfer into and out of AWS. If you're running EC2 instances in AWS, your Regatta file system is in AWS, and your S3 bucket is in AWS -- then you shouldn't incur additional data transfer fees.
Where you say AWS, you mean "a single AWS region"
But anyway, from your YCombinator blurb:
Does this mean Regatta trades consistency for cost (S3 and EBS and local storage are all CP systems these days)?Yes, that's correct re: Region -- thanks for the clarification.
In some sense, yes. But, the consistency that you're trading is only for accessing data simultaneously through the file interface and the S3 interface simultaneously. The consistent is CP/strong when you access the data through the file interface. The model that we see most often work is folks will ingest data through S3 (for example, an 'input/' prefix), and then the file system will process that data and place it in a different directory (for example, an 'output/' folder). Then, if it takes a minute or two for those to update on the other side, it's not a big deal.
It async replicates to s3, while providing a consistent interface to storage clients.
How does this differ from what Nasuni offers?
Hey there, I have mutual friends with some of the Nasuni folks, and I have a lot of respect for what they do. In particular, Nasuni stores data in a proprietary block format in your S3 bucket, so you can't connect it to existing data sets or use that data directly from S3 out the other side. Whereas with Regatta, we store data in its native format in S3 so you can do these things.
What's cool about the storage market is that there are so many impressive companies because there are so many varied needs from customer applications! We're hoping to become a simple "default" for teams who are writing applications in the cloud.
Is every file a s3 object? What if you change the middle of a large file?
That's correct -- every file is an S3 object. If you change the middle of a large file, Regatta will store the change on our durable caching layer efficiently (and most writes complete in under 1ms). Regatta will then asynchronously update the large object in S3, which may take longer. We automatically batch multiple changes together to minimize the number of operations to your S3 bucket!
What are the consistency semantics?
All connected file system clients see strong, read-after-write consistency. Most file operations are synchronized to S3 within a few minutes of completion.
Do you do anything to handle/detect write conflicts?
Write conflicts between the file system and S3 should be rare (by definition, applications shouldn't yet be designed to do this because Regatta doesn't exist). We do some tracking of the object etag to at least throw an alert if we find that something unexpected has happened, and we're looking at the best UX to expose that to customers soon.
How does this compare to S3 compatible CSI drivers like DirectPV?
I could totally be misreading DirectPV, but it appears to be a way to use K8s Persistent Volumes to manage things like NVME drives which are attached to each node, and doesn't provide any tie in to S3 (outside of the fact that it's built to power MinIO).
How does this differ from AWS Storage Gateway?
(full disclosure, reposted from a comment below)
Great question! We fill the same role as AWS Storage Gateway (and I used to work closely with that team when I was at AWS, lots of respect for what they do). AWS Storage Gateway is built primarily as an appliance to be installed on instances in your own data center to ease migration to the cloud. Many customers do deploy Storage Gateway on EC2 because they want these features in the cloud itself. However, the "appliance" design of Storage Gateway makes it unsuitable for this purpose. For example, Storage Gateway is not designed to run in a cluster for high-availability and doesn't have access to durable, long-term storage to stage and cache writes.
On the other hand, Regatta is designed as a cloud-native gateway product. Regatta's elastic, durable caching layer allows us to efficiently cache large data sets without thrashing, and always efficiently perform writes. Because Regatta is designed to be highly-available, customers don't have to worry about downtime for patching or deployments.
Is there any open source alternative to something like this?
Hey, thanks for asking. It very much depends on which aspect of Regatta you're interested in using. I know of a couple of different architectures -- some folks wrote in "rclone" in the thread, I know of people using SeaweedFS if you want to host storage infrastructure yourself, etc.
I'd love to know a bit more about why you're looking for an open source alternative. Is it because of costs (i.e. you'd like an open source alternative that doesn't require you to pay) or if it's because of the operating environment (i.e. you want an open source alternative so that you can deploy it to your own infrastructure)? There are some things that we are exploring around deploying onto your own infrastructure over the next 12 months, but I'd love to learn more. Feel free to respond here or email me at hleath [at] regattastorage.com.
In 2024, you are better off dropping the file system abstraction entirely and just embracing object storage abstractions (and ideally, immutable write-once objects).
Source: personal experience, I've done the EFS path and the S3-like path within the same system, and the latter was much easier to develop for and troubleshoot performance. It's also far cheaper to operate.
You can have local caching, rapid "read what I wrote", etc. with very little engineering cost, no one at my company is dedicated to this because the abstraction is ridiculously simple:
1. It's object storage, not a file system. Embrace immutability.
2. When you write to S3, cache locally as well.
3. When you read from S3, check the cache first. Optionally cache locally on reads from S3.
4. Set cache sizes so you don't blow out local storage.
5. Tier your caches when needed to increase sharing. (Immutability makes this trivially safe.)
All that's left is to manage 'checked out files' which is pretty easy when almost all of them are immutable anyway.
I totally agree that we're continuing to see a trend of applications which are designed to work directly on S3.
However, like the S3 protocol, I think that the file protocol is cemented in time as something that we will be using 100 years from now. For example, most AI applications do still download data sets to local file system devices to actually load and use, this is why you see a lot of HPC workloads use things like Lustre. Postgres, SQLite, etc all use file system semantics to operate the database.
I totally respect folks who rewrite their applications to work directly with S3, but as you point out, it comes with a different set of challenges (around caching and chunking).
How does it handle data append and file editing?
Thanks for the question. We stage writes to a durable, shared caching layer. This allows us to respond quickly to your application when it performs these operations (<1ms), but then asynchronously send those operations to S3 later. When connecting through Regatta, all file system clients see a strongly consistent read-after-write view of the changes on the file system, even if they haven't yet propagated to S3.
This sounds unnecessary and expensive. Why use this over similar self-managed open source offerings?
I bet this guy runs his own servers and databases in his basement too, because fk TCO amirite
Hey there, thanks for the concern. There are a spectrum of teams out there. Some teams are totally comfortable building something like this and running their own storage infrastructure. Other teams want a fully managed solution to handle storage for them so that they can focus on building. I think it's great that we have a spectrum of products!
i'm not in storage SaaS, so nooby question - how is this different from Snowflake or Databricks?
Thanks for the question!
Snowflake and Databricks aren't storage products, but are managed compute platforms on top of storage that probably looks a lot like this. Snowflake allows you to easily connect different data sets to your data warehouse, and Databricks provides a managed analytics (Spark) offering.
Regatta, on the other hand, would allow you to more easily build the next Snowflake or Databricks by taking advantage of the same low-cost, unlimited storage in S3 that they likely use.
That's pretty cool Anybody know of something similar for azure cloud?
We are looking at launching in Azure Cloud with support for Azure Blob Storage as the backend within the next 6 months. If there's a specific use case that you have, it would be helpful to share it with me at hleath [at] regattastorage.com so we can appropriately prioritize Azure against other cloud vendors and regions.
Fascinating. If this had been around a year ago, we could have used it in our datacenter build-out. For data source reasons, we record data in the cloud. In the past, we'd stick most of the data in S3 and only egress what we needed to run analysis on. The way we'd do that is that we have a machine with 16 * 30 TiB SSDs that acts as our on-prem cache of our S3 data. It did this using a slightly modified goofys with a more modified catfs in front of it, with both the cache and the catfs view exported over NFSv4. We had application-level switching between the cache and the export since our data was really read-only.
When the cache got full, catfs would evict things from it pretty simply. It's overall got a good design but has a few bugs you have to fix, and when you have 100 machines connecting to it, it requires some tuning to make sure that it doesn't all stall. But it worked for the most part.
Anyway, I think this is cool tech. I'm currently doing some bioinformatics stuff that this might help with (each genome sequence is some 100 GiB compressed). I'll give it a shot some time in the next couple of months.
That's exactly the kind of thing that I've been hearing lots of teams having to solve individually, and I'm glad that this set up worked out for you. Would love to see you try it for bioinformatics (another industry where this problem seems to show up frequently), feel free to reach out with any questions when you start that.
Does this compete with Minio?
I don't think so, I see them as complementary. MinIO is great when you have downstream applications which speak the S3 API that need acceleration of that data. Regatta is designed for applications which speak file semantics (think, application logging, storing corpuses of training data, or state) that doesn't run on the S3 API. Regatta actually supports MinIO as an S3-compatible backend for your file system!
I think it’s more analogous to Minio’s discontinued proxy mode. This is where you’d talk to minio locally (using whatever interface/protocol) and it would act as a local cage for S3 objects. If you wrote to it, it would propagate the changes up to S3 proper (or whomever using the S3 protocol).
I believe they stopped supporting that mode because they didn’t want to keep chasing every S3 protocol change. However, if you’re just using S3, and not trying to masquerade as S3, this problem becomes easier.
I think it's complementary as well, even more so after MinIO deprecating its Gateway and Filesystem modes a couple of years ago. MinIO is "S3 compatible" object storage, so technically, MinIO users should be able to use your product to have a file-system like experience on their buckets and objects, although you're using IAM and there might be a need either for your client to handle pure S3 credentials, either for a third-party plugin to your client to do that. It could be a good opportunity to piggyback on MinIO's userbase.
We had built an MLOps platform[0] a few years ago and enabled users to use their S3 buckets in a "file system like" manner. This made it possible for them not to have to know or write S3 specific code in their Jupyter notebooks as most people in the industry did with boto3, which also forced them to write code (say using TensorFlow) in a certain way for training to consume the files, err, objects. It was a mess, and we removed that for notebooks that could run the same way on a laptop or on the platform, even with the shell kernel so people could explore objects like files. MLFlow could work on a filesystem or on S3, but it had no authentication, so we built around that to know which user/experiment produced which artifact.
MinIO had a Gateway that was deprecated. We didn't use it much and they didn't have an admin client at the time, so I rolled one up to orchestrate the thing.
One way I did it that hook into users' compute and storage as opposed to offering storage/compute was for two reasons:
- Organizations already had their data somewhere with established policies. Getting them to move that data is very hard (CISO, CTO, IT, legal, engineers). Friction would have been huge.
- Organizations already had budgeted compute and storage, they may have had contracts/discounts/credits with cloud providers and it didn't make sense to ask them to make a decision on budgeting for another solution.
- A design principle of having the product being able to die without leaving the users scrambling to exfil/migrate data.
One way to do it was to handle FUSE, and your mileage may vary (s3fs-fuse, goofys, etc). Amazon has released Mountpoint last year[1], and one question you'll get asked is why use Regatta when I could use Mountpoint?
Less friction for engineers and execs.
In any way, congratulations on the launch, man!
[0]: https://web.archive.org/web/20230325150132/https://iko.ai/
[1]: https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-ge...
We are finding a lot of success in the ML Ops space for exactly this reason. I also completely agree that enterprise customers want to keep their data where they can govern and audit it (often in S3). We're excited about the possibility to allow folks to access and use that data while it stays in S3 for primary storage.
I agree around the questions with Mountpoint, and we're solving a very different set of problems than Mountpoint. Mountpoint, for example, isn't designed to be used with all file applications and lacks support for things like appends to existing files, random writes, renames, and symbolic links. On the other hand, Regatta supports POSIX semantics and can work with nearly all file based applications.
That’s so nice see, because in the few days I had been tinkering with the concept of file system + blob storage but I had hard time com up with use-cases other than an unlimited Dropbox where you own the storage and truly pay as you go.
I think that "owning the storage" is such an important part of this. I'm excited that folks who use this will continue to have access to their data directly through S3, so if they ever decide to move off of Regatta, all their data is still right there. This is also important at large companies which already have compliance and governance workflows that connect to data in S3 -- Regatta enables them to continue to use those workflows without having to think about another primary storage system.
SeaweedFS and GarageFS?
These distributed storage systems solve very similar problems, depending on how you use them. Our target customers aren't looking to deploy their own infrastructure, so having a "single-click" option without having to think about how much capacity they need is very valuable.
People have been throwing out "POSIX" distributed file systems for a long time but this claim usually raises more questions than it answers. Especially since clients access it via NFSv3, which has extremely weak semantics and leaves most POSIX filesystem features unimplemented.
I think this is a great call out, and you're correct. One example that comes to mind is that NFSv3 doesn't support flags on the rename() operation (such as RENAME_WHITEOUT), which means that you can't use them as an overlay upperdir (which is desirable for building container runtimes). To solve this, we're working on a custom protocol that we intend to place in the Linux kernel which will expose a broader set of features than we can get in NFS. As I tell people, this is the worst version of Regatta that will ever exist -- we're going to make it better every day.
You can implement a single client NFSv3 server that provides stronger than expected (of NFSv3) guarantees and if you implement the "optional" companion protocols it should come closer to local filesystem semantics than most network filesystems. What would be neat about such a solution is that you can run the server either locally or remotely (same site, high bandwidth, low latency) and at the same time clients would have to a custom FUSE server or even worse load an (from the customer's point of view) experimental vendor kernel module. Upgrading from NFSv3 to NFSv4 would get you a bit closer to POSIX semantics, but of course it would still be NFS just not over a congested, jittery link to a shared server. Especially NFSv4 delegations could be a nice way to let the clients kernel buffer a lot of bursty async I/O locally. Just keep in mind how little POSIX really guarantees instead of assuming it will behave like ext4/XFS or even better ZFS on a laptop NVMe with two levels of power loss protection (big caps in the drive and the laptop battery).
I think this is exactly right, but there are lots of people who don't want to manage their own NFS servers -- that's who we're targeting with Regatta. Notably, I think that v4 delegations gets you close but not close enough to the performance that we're looking for. For example, you can't get a delegation for a directory (which means that you're still doing round trips for CREATE and UNLINK), which seems to be the case even with "nocto". But, I need to spend more time playing around with that.
I dunno if this is considered off-topic, since it's commentary about the website, but that's twice in the past week I've seen a launch website that must have used a template or something because almost all the links in the footer are href="#". If you don't have Careers, Privacy Policy, Terms, or an opinion about Cookies, then just nuke those links
Great call out -- we'll get that done. Thanks!
Was about to say the same, plus some typos in the documentation section (see here: https://triplechecker.com/s/743384/docs.regattastorage.com)
Thank you -- this seems like a fantastic service. You see I consistently miss that "h" in synchronize.
Appreciate the kind words!
TL;DR: is this a cloud service or an on-premise thing?
This is a managed cloud service. If you're interested in using Regatta on-premises, I'd love to hear from you -- shoot me some mail at hleath [at] regattastorage.com
I wonder why this is a top post on Monday? Is this artificially boosted by moderator Dang?
Because people are excited... All the positive comments didn't tip you off?