Self hosting 10TB in S3 on a framework laptop and disks

(jamesoclaire.com)

130 points | by ddxv 10 hours ago ago

72 comments

  • 6ak74rfy 15 minutes ago

    ZFS is RAM hungry, plus doesn't like USB connections (like the article implied). So, I've been eyeing btrfs as a way to setup my NAS drives. Would I miss something in that setup?

  • VTimofeenko 3 hours ago

    If it's just the mainboard and no screen, OP could put it in a dedicated case like the CoolerMaster one:

    https://www.coolermaster.com/en-global/products/framework/

    • ddxv an hour ago

      Those are pretty cool. I meant to highlight more, that the laptop has done super well. I can't even tell it's on as I hear no fan / no heat. I guess laptops are pretty good for this as they are great at sipping power when there is a low load.

      • smileybarry 15 minutes ago

        Back in 2012 or so, I reused an old netbook (an Asus Eee PC) with an Atom CPU & 1GB of RAM, installed Ubuntu Server, and used it as a home server. It handled the printer, DNS-VPN proxying for streaming, and a few other things admirably for years. (And ironically was resilient to Spectre because its Atom CPU was before Intel added speculative execution)

        Eventually, the thing that kicked the bucket was actually the keyboard (and later the fan started making "my car won't start" noises occasionally). Even the horribly-slow HDD (that handled Ubuntu Server surprisingly well) hadn't died yet.

  • j1elo an hour ago

    I'd like more ellaboration on the technical side. Not literally how to do the same and what commands to use, but more in the line of how are the ZFS pools configured, or if Garage is opinionated and configures it all by itself. Are there mirrors in there? Or it's just individual pools that sync from some disks to others?

    I have 2 USB disks and want to make a cheapo NAS but I always doubt between making a ZFS mirror, making 2 independent pools and use one to backup the other, or just go the alternate route and use SnapRAID and then be able to mix more older HDDs for maximum usage of the hardware I already own.

    • ddxv an hour ago

      My understanding is that Garage is not opinionated and could easily have worked without ZFS. I installed ZFS in Ubuntu, and then later installed Garage.

      As for the ZFS setup, I kept it simple and did RAID5/raidz1. I'm no expert in that, and have been starting to think about it again as the pool approaches 33% full.

      I saw this comment in another thread here that sounded interesting as well by magicalhippo: "I've been using ZFS for quite a while, and I had a realization some time ago that for a lot of data, I could tolerate a few hours worth of loss. So instead of a mirror, I've set up two separate one-disk pools, with automatic snapshots of the primary pool every few hours, which are then zfs send/recv to the other pool."

      This caught my attention as it matches my usecase well. My original idea was that RAID5 would be good incase a HD fails, and that I would replicate the setup at another location, but the overall costs (~$1k USD) are enough that I haven't done that yet.

  • sandreas 3 hours ago

    I'd rather go with an old Dell T30 and 2x10TB Seagate Exos in ZFS RAID1 mode (Mirror). This thing would make me nervous every day, even with a daily backup in place... While the Dell T30 would also make me nervous, you could at least plug the disks into any other device and are not wiring up everything with some easy to pull out cables ;)

    However, garage sounds nice :-) Thanks for posting.

    • magicalhippo 3 hours ago

      I've been using ZFS for quite a while, and I had a realization some time ago that for a lot of data, I could tolerate a few hours worth of loss.

      So instead of a mirror, I've set up two separate one-disk pools, with automatic snapshots of the primary pool every few hours, which are then zfs send/recv to the other pool.

      This gives me a lot more flexibility in terms of the disks involved, one could be SSD other spinning rust for example, at the cost of some read speed and potential uptime.

      Depending on your needs, you could even have the other disk external, and only connect it every few days.

      I also have another mirrored RAID pool for more precious data. However almost all articles on ZFS focus on the RAID aspect, while few talk about the less hardware demanding setup described above.

      • ddxv an hour ago

        That's a really cool idea and matches my use case well. I just copy pasted it to another person in this thread who was asking about the ZFS setup.

        Your use case perfectly matches mine in that I wouldn't mind much about a few hours of data loss.

        I guess the one issue is that it would require more disks, which at the current prices is not cheap. I was suprised how expensive it was when I bought them 6 months ago and was even more suprised when I looked recently and the same drives are even more now.

      • oarsinsync an hour ago

        I opted to use a two disk mirror, and offline the slow disk. Hourly cronjob to online the slow disk, wait, and then offline it again.

        Gives me the benefit of automatic fixes in the event of bit rot in any blocks more then an hour old too.

        • j1elo an hour ago

          That sounds cool; is it possible to just query the ZFS system to know when it has finished synchronizing the slow disk, before bringing it offline again? Do you think that stopping and spinning the disk again, 24 times a day, is not going to cause much wear to the motors?

        • magicalhippo an hour ago

          That is another way, though annoying if you've set up automatic error reporting.

  • tcdent 28 minutes ago

    Thanks for the lead on Garage S3. Everyone's always recommending minIO and Ceph which are just not fun to work with.

  • evanreichard 3 hours ago

    I love Garage. It just works. I have Garage running on a few older Odroid HC2's, primarily for k8s Velero backup, and it's just set and forget.

  • ddxv 10 hours ago

    Just wanted to share a quiet successful self hosting.

    • fpoling 5 hours ago

      Does this JBOD consist of SSD? HDDs in that amount can be rather noisy.

      • ddxv an hour ago

        Yeah they are HDs and are surprisingly noisy.

    • myself248 6 hours ago

      It's weird to me that "owning a computer that runs stuff" is now "self-hosting", just feels like an odd phrasing. Like there's an assumption that all computers belong to someone else now, so we have to specify that we're using our own.

      • mingus88 4 hours ago

        Think services

        You can own a computer and not run any services at all. Most people do.

        Deciding to run your own services, like email, means a lot of work that most people aren’t interested or capable of doing.

        It’s the difference between using your computer to consume things or produce things.

      • vachina 4 hours ago

        We call it self hosting because it is typically hosted by someone else, get it?

      • mlnj 6 hours ago

        Let's not kid ourselves that maintaining 10TB with resiliency handling and other controls built in is something that is trivial. It is only trivial due to the offerings that Cloud computing has made easy.

        Self-hosting implies those features without the cloud element and not just buying a computer.

        • rokkamokka 5 hours ago

          10tb fits on one disk though - it may not be trivial but it's not overly complicated setting up a raid-1. Off-site redundancy and backup of course does make it more complicated however.

          • mlnj 4 hours ago

            And all of those things are more steps than "buying a computer".

            Reminds me of the "Dropbox can be built in a weekend"

            • Symbiote 4 hours ago

              You can buy a 10TB+ external drive which uses RAID1.

              You can also buy a computer with this — not a laptop, and I don't know about budget desktops, but on Dell's site (for example) it's just a drop-down selection box.

        • znpy 33 minutes ago

          Moot point. It really depends on your expectations.

          Self-hosting 10TB in an enterprise context is trivial.

          Self hosting 10TB at home is easy.

          The thing is: once you learn enough ZFS, whether you’re hosting 10 or 200TB it doesn’t change much.

          The real challenge is justifying to yourself spending for all those disks. But if it’s functional to yourself spending hobby…

  • qudat 6 hours ago

    Very cool! I replaced my mainboard on my framework and am trying to convert it to a backup for my nas.

    Could you talk a little more about your zfs setup? I literally just want it to be a place to send snapshots but I’m worried about the usb connection speed and the accidentally unplugging it and losing data

  • n4bz0r 6 hours ago

    Getting into S3 myself and really curious about what Garage has to offer vs the more mature alternatives like Minio. From what I gather, it kinda works better with small (a few kilobytes) files or something?

    • photon-torpedo 5 hours ago

      Minio recently started removing features from the community version. https://news.ycombinator.com/item?id=44136108

      • znpy 31 minutes ago

        How awful. It seems to be a pattern nowadays?

        Some former colleagues still using gitlab ce tell me they also removed features from their self-hosted version, particularly from their runners.

    • chamomeal 5 hours ago

      I loved minio until they silently removed 99% of the admin UI to push users towards the paid offering. It just disappeared one day after fetching the new minio images. The only evidence of the change online was discussions by confused users in the GitHub issues

    • LTL_FTC 3 hours ago

      I have also been considering this for some time. Been comparing MinIO, Garage, and Ceph. MinIO may not be wise given their recent moves, as another commenter noted. Garage seems ok but their git doesn’t show much activity these days so I wonder if it too will be abandoned. Which leaves us with Ceph. May have a higher learning curve but also offers the most flexibility as one can do object as well as block and file. Gonna set up a single node with 9 OSD’s soon and give it a go but always looking for input if anyone would like to provide some.

      • lxpz 2 hours ago

        If I can reassure you about Garage, it's not at all abandoned. We have active work going on to make a GUI for cluster administration, and we have applied for a new round of funding for more low-level work on performance, which should keep us going for the next year or so. Expect some more activity in the near future.

        I manage several Garage clusters and will keep maintaining the software to keep these clusters running. But concerning the "low level of activity in the git repo": we originally built Garage for some specific needs, and it fits these needs quite well in its current form. So I'd argue that "low activity" doesn't mean it's not reliable, in fact it's the contrary: low activity means that it works well for us and there isn't a need to change anything.

        Of course implementing new features is another deal, I personally have only limited time to spend on implementing features that I don't need myself. But we would always welcome outside contributions of new features from people with specific needs.

        • LTL_FTC 20 minutes ago

          I appreciate the response! Thanks for the update. I will continue keeping an eye on the project then and possibly giving it a try. I have read the docs and was considering setting it up across two sites. The implementation seemed address this pain point with distributed storage solutions and latency.

      • sekh60 2 hours ago

        I've used Ceph in a home lab setting for 9 years or so now. Since cephadm is has gotten even easier to manage even though it really was never that hard. A few pointers. No SMR drives, they have such bad performance that they can periodically drop out of the cluster. Second, no consumer SSDs/NVMe devices. You need power loss prevention on your drives. Ceph directly writes to the drive, it ignores cache, without PLP you may literally have slower performance than rust.

        You also want fast networking, I just use 10Gbps. My nodes each are 6 rust and 1 NVMe drive each, 5 nodes. I colocate my MONs and MDS daemons with my OSDs, each node has 64GB of RAM and I use around 40GB.

        Usage is RDB for a three node OpenStack cluster, and CephFS. I have about 424TiB between rust and NVMe raw.

        • znpy 26 minutes ago

          The point about smr drives cannot be stressed enough.

          Smr drives are absolutly shit-tier choice in terms of drives.

  • yodon 5 hours ago

    Previous discussion of Garage:

    https://news.ycombinator.com/item?id=41013004

  • OutOfHere an hour ago

    Why are you calling it S3? That is a proprietary Amazon cloud technology. Why not call it what is it is, e.g. ZFS, file store, or object store? Let's not dilute terms.

    • ddxv 40 minutes ago

      That's a good point, it is S3 compatible object storage, not just S3. My experience with AWS S3 has impacted the way I use object storage and since this project is syncd to another S3 compatible object storage using the S3 protocol, in my head I just call it all S3.

    • thakoppno an hour ago

      > Garage implements the Amazon S3 API and thus is already compatible with many applications.

      https://garagehq.deuxfleurs.fr/

      • OutOfHere an hour ago

        Yes, it's S3 API compatible, but it's not S3. The originally submitted article title misleads by claiming it's S3. There is no valid excuse.

  • canpan 6 hours ago

    Neat. Depending on your use case it might make sense. Still I wonder what they use for backup? For many use cases downtime is acceptable, but data loss is generally not. Did I miss it in the post?

    • ddxv an hour ago

      OP here. There I currently have some things syncd to a cloud S3. The long term plan would be to replicate the setup at another location to take advantage of garage region/nodes, but need to wait for the money for that.

  • scottarthur 5 hours ago

    I have an ancient Qnap NAS (2015) which is on borrowed time and I’m trying to figure out what to replace it with. Keep going back and forth between rolling my own with a Jonsbo case vs. a prebuilt like the new Ubiquti boxes. This is an attractive third option of a modest compute box (raspy, NUC, etc.) paired with a JBOD over USB. Can you still use something like TrueNAS with a setup like that?

    • j45 5 hours ago

      Local storage should be like a home appliance, not something we build even though we can.

      When things inevitably need attention it’s not about diy.

  • rmoriz 5 hours ago

    With the metadata only on the internal drive, isn't this a SPOF?

    • Havoc 4 hours ago

      Given that it's JBOD over USB I don't think this is aimed at redundancy

      • ddxv an hour ago

        Yeah, this was an effort to get around cloud costs for large amounts of 'low value' data that I have but use in my other home servers for processing. I still sync some smaller result sets to an S3 in the cloud for redundancy as well as for CDN uses.

      • rmoriz 3 hours ago

        I thought zfs is doing the RAID.

        • Havoc 3 hours ago

          It could be. Author didn't specify. zfs isn't inherently redundant or RAID so it may or may not have redundancy

  • ocharles 6 hours ago

    What enclosure houses the JBOD?

    • Hikikomori 5 hours ago

      Don't know about that one but can recommend Terramaster DAS, they don't cheap out on the controller. I have a d4-320 connected to my NUC.

  • cgdstnc 5 hours ago

    i'd be stressed out while watering those plants.

    • ellisv 4 hours ago

      Plants look very portable

    • nrp 4 hours ago

      The laptop is easy to repair, at least.

  • igtztorrero 5 hours ago

    Amazing, I will try Garage.

    What brand of HDD did you use?

    • ddxv an hour ago

      I went with IronWolf, likely due to price, though interestingly they are 25% more expensive than when I bought them six months ago.

    • UI_at_80x24 3 hours ago

      Read up on backblaze hard drive reports. Great source of info

  • rob_c 3 hours ago

    10TB, you could just mirror 2 drives with that, seen people serving 10PB at home by this point I'm sorry to say

  • andix 5 hours ago

    I really don't get it. Do they host it on Amazon S3 or do they self-host it on a NAS?

    • wiether 5 hours ago

      They built an object storage system exposing an S3-compatible API, by using https://garagehq.deuxfleurs.fr/

      • andix 4 hours ago

        Okay, weird to call it S3, if it is just object storage somewhere else. Its like saying "EKS" if you mean Kubernetes, or talking about "self hosting EC2" by installing qemu.

        • miduil 4 hours ago

          > weird to call it S3

          I feel that is a bit of an unfair assessment.

          AWS S3 was the first S3-compatible API provider, nowadays most cloud providers and bunch of self hosted software supports S3(-Compatible) APIs. Call it Object Store (which is a bit unspecific) or call it S3-Compatible.

          EKS and EC2 on the other hand are a set of tools and services, operated by AWS for you - with some APIs surrounding them that are not replicated by any other party (at least for production use).

        • brandon272 an hour ago

          S3 is both a product and basically an API standard.

        • prmoustache 3 hours ago

          Garage talks the same S3 API.

    • j45 5 hours ago

      It’s self hosted, and self hosted nas’ can run the s3 storage protocol locally as well.

      • andix 4 hours ago

        Yeah, that's pretty standard for object storage to be S3-compatible. I think azure blob is the only one that doesn't support it.

  • awill 5 hours ago

    >>About 5 months ago I made the decision to start self hosting my own S3.

    It is eleven nines of durability? No. You didn't build S3. You built a cheapo NAS.

    • Datagenerator 3 hours ago

      And won't be charged for ingres, egress or IOPS etc, it's better than bad, it's good. Happy times.

    • Havoc 4 hours ago

      I think it's pretty obvious he's talking about the protocol not the amazon service...