We Saved $500k per Year by Rolling Our Own "S3"

(engineering.nanit.com)

99 points | by mpweiher 8 hours ago ago

66 comments

  • varenc 21 minutes ago

    In HN style, I'm going to diverge from the content and rant about the company:

    Nanit needs this storage because they run cloud based baby cameras. Every Nanit user is uploading video and audio of their home/baby live to Nanit without any E2EE. It's a hot mic sending anything you say near it to the cloud.

    Their hardware essentially requires a subscription to use, even though it costs $200/camera. You must spend an additional $200 on a Nanit floor stand if you want sleep tracking. This is purely a software limitation since there's plenty of other ways to get an overhead camera mount. (I'm curious how they even detect if you're using the stand since it's just a USB-C cable. Maybe etags?)

    Of course Nanit is a popular and successful product that many parents swear by. It just pains me to see cloud based in-home audio/video storage being so normalized. Self-hosted video isn't that hard but no one makes a baby-monitor centric solution. I'm sure the cloud based video storage model will continue to be popular because it's easy, but also because it helps justifies a recurring subscription.

    • sbrother 8 minutes ago

      As a happy customer, I picked nanit because it actually worked. We didn’t even use the “smart” features, but “you can turn on the app from anywhere you happen to be and expect the video feed to work” is unfortunately a bar that no competitor I tried could meet. The others were mostly made by non-software companies with outsourced apps that worked maybe 50% of the time.

      I wish we could have local-first and e2ee consumer software for this sort of thing, but given the choice of that or actually usable software, I am going to pick the latter.

      • varenc 2 minutes ago

        [delayed]

      • vachina a few seconds ago

        What competitor have you actually tried? My girlfriend’s parents have a few cheap TPlink solar powered CCTV and they work flawlessly since setup. I have a Xiaomi one and it too work well.

        My impression is live feed is a solved problem.

      • vlovich123 2 minutes ago

        The vtech camera is working well enough for me for what it’s worth. But any such app solution generally implies transfer through the company’s servers.

    • jen20 18 minutes ago

      This is the reason I refused to buy Nanit cameras, instead opting for unconnected models. E2E encryption is table stakes.

  • Lucian6 an hour ago

    Having gone through S3 cost optimization ourselves, I want to share some important nuances around this approach. While the raw storage costs can look attractive, there are hidden operational costs to consider:

    We found that implementing proper data durability (3+ replicas, corruption detection, automatic repair) added ~40% overhead to our initial estimates. The engineering time spent building and maintaining custom tooling for multi-region replication, access controls, and monitoring ended up being substantial - about 1.5 FTE over 18 months.

    For high-throughput workloads (>500 req/s), we actually saw better cost efficiency with S3 due to their economies of scale on bandwidth. The breakeven point seems to be around 100-200TB of relatively static data with predictable access patterns. Below that, the operational overhead of running your own storage likely exceeds S3's markup.

    The key is to be really honest about your use case. Are you truly at scale? Do you have the engineering resources to build AND maintain this long-term? Sometimes paying the AWS premium is worth it for the operational simplicity.

    • supriyo-biswas 37 minutes ago

      Why do all your comments seem LLM generated? You do clearly have something to contribute, but it’s probably better to just write what you’re talking about than going through a LLM.

      • pjjpo 30 minutes ago

        I don't know about the commenter specifically but in general, using LLMs to format text is a game changer in the ability for English-as-Second-Language folks to contribute to tech conversations. While I get where some of the bias against anything LLM generated comes from, I would keep it for editorial content and not community comments to be fair to a global audience.

        • ashdksnndck 23 minutes ago

          I’m worried that LLMs could facilitate cheap, scaled astroturfing.

          I understand that people encounter discrimination based on English skill, and it makes sense that people will use LLMs to help with that, especially in a professional context. On the other hand, I’d instinctively be more trusting of the authenticity of a comment with some language errors than one that reads like it was generated by ChatGPT.

        • barrell 8 minutes ago

          I’m not sure if that’s a realistic ask. There is ample abuse of LLM generated content, and there are plenty of ESL publishers.

          Personally I would recommend including a note that English is not your native language and you had an LLM clean things up. I think people are willing to give quite a bit of grace, if it’s disclosed.

          Personally, I’d rather see a response in your native language with a translation, but I’m fairly certain I’m the odd one out in that situation XD

        • phito 23 minutes ago

          It just makes everything sound bland and soulless. You don't know which part of the message actually comes from the user's brain and which part has been added/suggested by the LLM. The latter is not an original thought and it would be disingenuous to include it, but people do because it makes them look smarter. Meanwhile, on the other side, you might as well be talking to a LLM...

    • YZF an hour ago

      Right. Having worked on a commercial S3 compatible storage I can tell y'all that there's a lot more to it then just sticking some files on JBOD. It does depend on your specific requirements though. 1.5 FTE over 18 months sounds on the low side for everything you've described.

      That said the article seems to be more about an optimization of their pipeline to reduce the S3 usage by holding some objects in memory instead. That's very different than trying to build your own object store to replace S3.

    • john01dav 19 minutes ago

      There are more options than using S3 or completely rolling your own on JBOD. For example, you could use a cheaper S3-compatible cloud (such as Backblaze) or you can deploy a project such as Ceph.

    • Twirrim 35 minutes ago

      S3 does more than 3x replica durability, as well, they use a form of erasure coding. They can lose several hard drives/servers/racks before your data becomes at risk, and have sufficient spare capacity to very quickly reproduce any missing shards before things become a problem.

      That said, S3 seems like a really odd fit for their workload, plus their dependency on lifecycle rules seems utterly bizarre.

      > Storage was a secondary tax. Even when processing finished in ~2 s, Lifecycle deletes meant paying for ~24 h of storage.

      They decided not to implement the deletion logic in their service, so they'd just leave files sitting around for hours instead needlessly paying that storage cost? I wonder how much money they'd have saved if they just added that deletion logic.

    • groundzeros2015 40 minutes ago

      Is spending time to optimize S3 in the manner you describe not a relevant cost?

  • ixtli an hour ago

    They didn’t actually do what the headline claims. They made a memory cache which sits in front of S3 for the happy path. Cool but not nearly rolling your own S3

  • Havoc 3 hours ago

    Tbh I feel this in one of those that would be significantly cleaner without serverless in first place.

    Sticking something with 2 second lifespan on disk to shoehorn it into aws serverless paradigm created problems and cost out of thin air here

    Good solution moving at least partially to a in memory solution though

    • tcdent 3 hours ago

      Yeah, so now you're basically running a heavy instance in order to get the network throughput and the RAM, but not really using that much CPU when you could probably handle the encode with the available headroom. Although the article lists TLS handshakes as being a significant source of CPU usage, I must be missing something because I don't see how that is anywhere near the top of the constraints of a system like this.

      Regardless, I enjoyed the article and I appreciate that people are still finding ways to build systems tailored to their workflows.

      • inlined 2 hours ago

        Maybe they’re not using keepalives in their clients causing thousands of handshakes per second?

  • another_twist 3 minutes ago

    I mean the "S3" could be replaced with object storage. I guess thats the technical term anyway. Having said that just goes to show how cheap S3 is, if after all of this, the savings are just $500k. Definitely money saved but not a lot.

  • lpa22 19 minutes ago

    If anyone here uses the Nanit app in the background of their phones, it absolutely destroys battery life.

    I got a new phone because I thought my battery was cooked, but turns out it was just the app.

  • dxxvi an hour ago

    So, you want a place to store many files in a short period of time and when there's a new file, somebody must be notified?

    Have you ever thought of using a postgresql db (also on aws) to store those files and use CDC to publish messages about those files to a kafka topic? In your original way, we need 3 aws services: s3, lambda and sqs. With this way, we need 2: postgresql and kafka. I'm not sure how well this method works though :-)

  • none2585 3 hours ago

    I'm curious how many engineers per year this costs to maintain

    • nbngeorcjhe 3 hours ago

      A small fraction of 1, probably? It sounds like a fairly simple service that shouldn't require much ongoing development

      • codedokode 2 hours ago

        Especially if you have access to LLMs.

      • hinkley 2 hours ago

        You're going to run a production system with a bus number of 1?

        I think you mean a small fraction of 3 engineers. And small fractions aren't that small.

        • adrianN an hour ago

          So far I have seen a lot more production systems with a bus factor of zero than production systems with a bus factor greater one.

    • CaptainOfCoit 2 hours ago

      > I'm curious how many engineers per year this costs to maintain

      The end of the article has this:

      > Consider custom infrastructure when you have both: sufficient scale for meaningful cost savings, and specific constraints that enable a simple solution. The engineering effort to build and maintain your system must be less than the infrastructure costs it eliminates. In our case, specific requirements (ephemeral storage, loss tolerance, S3 fallback) let us build something simple enough that maintenance costs stay low. Without both factors, stick with managed services.

      Seems they were well aware of the tradeoffs.

    • codedokode 3 hours ago

      And I am curious how many engineer years it requires to port code to cloud services and deal with multiple issues you cannot even debug due to not having root privileges in the cloud.

      Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB. And no weird network issues to debug.

      • rajamaka 2 hours ago

        > as simple as "with open(...) as f: f.write(data)"

        Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

        Without on-prem, saving a file is as simple as s3.put_object() !

        • AdieuToLogic 2 hours ago

          >> Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB.

          > Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

          Most of these concerns can be addressed with ZFS[0] provided by FreeBSD systems hosted in triple-A data centers.

          See also iSCSI[1].

          0 - https://docs.freebsd.org/en/books/handbook/zfs/

          1 - https://en.wikipedia.org/wiki/ISCSI

        • codedokode 2 hours ago

          With s3, you cannot use ls, grep and other tools.

          > Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

          Wow that's a lot to learn before using s3... I wonder how much it costs in salaries.

          > With what network topology?

          You don't need to care about this when using SSDs/HDDs.

          > With what access policies?

          Whichever is defined in your code, no restrictions unlike in S3. No need to study complicated AWS documentation and navigate through multiple consoles (this also costs you salaries by the way). No risk of leaking files due to misconfigured cloud services.

          > With what backup strategy?

          Automatically backed up with rest of your server data, no need to spend time on this.

          • rajamaka 2 hours ago

            > You don't need to care about this when using SSDs/HDDs.

            You do need to care when you move beyond a single server in a closet that runs your database, webserver and storage.

            > No risk of leaking files due to misconfigured cloud services.

            One misconfigured .htaccess file for example, could result in leaking files.

            • AdieuToLogic 44 minutes ago

              >> No risk of leaking files due to misconfigured cloud services.

              > One misconfigured .htaccess file for example, could result in leaking files.

              I don't think you are making a compelling case here, since both scenarios result in an undesirable exposure. Unless your point is both cloud services and local file systems can be equally exploited?

            • codedokode 26 minutes ago

              > One misconfigured .htaccess

              First, I hope nobody is using Apache anymore, second, you typically store files outside of web directory.

            • Nextgrid an hour ago

              With bare-metal machines you can go very far before needing to scale beyond one machine.

          • inlined 2 hours ago

            It sounds like you’re not at the scale where cloud storage is obviously useful. By the time you definitely need S3/GCS you have problems making sure files are accessible everywhere. “Grep” is a ludicrous proposition against large blob stores

          • coderintherye 2 hours ago

            I mean you can easily mount the S3 bucket to the local filesystem (e.g. using s3fs-fuse) and then use standard command line tools such as ls and grep.

            • hallman76 2 hours ago

              I inherited an S3 bucket where hundreds of thousands of files were written to the bucket root. Every filename was just a uuid. ls might work after waiting to page though to get every file. To grep you would need to download 5 TB.

            • codedokode 2 hours ago

              It's probably going to be dog slow. I dealt with HDDs where just iterating through all files and directories takes hours, and network storage is going to be even slower at this scale.

        • Rohansi 2 hours ago

          I don't think any of those mattered for their use case. That's why they didn't actually need S3.

        • bcrosby95 2 hours ago

          You can't ever definitively answer most of those questions on someone else's cloud. You just take Amazons word for whatever number of nines they claim it has.

          • rajamaka 2 hours ago

            Not needing to ask the questions is the selling point.

            • grebc an hour ago

              Bro were you off grid last week. Your questions equally apply to AWS, you just magically handwave away all those questions as if AWS/GCP/Azure outages aren’t a thing.

            • patrick451 an hour ago

              Until it goes down because because aws STILL hasn't made themselves completely multi-region or can't figure our their DNS.

      • beoberha an hour ago

        A lot of reductive anti-cloud stuff gets posted here, but this might be the granddaddy of them all.

      • RedShift1 2 hours ago

        Ah that is where logging and traceability comes in! But not to worry, the cloud has excellent tools for that! The fact that logging and tracing will become half your cloud cost, oh well let's just sweep that under the rug.

      • mjr00 an hour ago

        > Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB. And no weird network issues to debug.

        There may be some additional features that S3 has over a direct filesystem write to a SSD in your closet. The people paying for cloud spend are paying for those features.

      • hinkley 2 hours ago

        Variation on an old classic.

        Question: How do you save a small fortune in cloud savings?

        Answer: First start with a large fortune.

    • codedokode 2 hours ago

      What I notice, that large companies use their own private cloud and datacenters. At their scale, it is cheaper to have their own storage. As a side business, they also sell cloud services themselves. And small companies probably don't have that much data to justify paying for a cloud instead of buying several SSDs/HDDs or creating SMB share on their Windows server.

    • UseofWeapons1 3 hours ago

      Yes, that was my thought as well. Breakeven might be like 1 (give or take 2x)?

      • hinkley 2 hours ago

        Anything worth doing needs three people. Even if they also are used for other things.

  • Huxley1 2 hours ago

    S3 certainly saves a lot of hassle, but in certain use cases, it really is prohibitively expensive. Has anyone tried self-hosted alternatives like MinIO or SeaweedFS? Or taken even more radical approaches? How do you balance between stability, maintenance overhead, and cost savings?

    • ddxv 2 hours ago

      MinIO has moved away from having a free community fork, and I think it's base cost is close to $100k a year. I've been using Garage and been happy, but as a single dev and orders of magnitude smaller than the OP, so there are certainly edge cases I'm missing to compare the two.

      • Cerium 39 minutes ago

        I'm a fellow new Garage user. I have had a great time so far - but I also don't need much. My use case is to share data analysis results with a small team. I wanted something simple to manage that can provide an s3 like interface to work with off the shelf data analysis tools.

  • elchananHaas 3 hours ago

    Video processing is one of those things that need caution when doing serverlessly. This solution makes sense, especially because S3s durability guarantees aren't needed.

  • ch2026 3 hours ago

    Who is “The South Korean Government”?

  • VladVladikoff 2 hours ago

    I’m mostly just impressed that some janky baby monitor has racked up server fees on this scale. Amazing example of absolutely horrible engineering.

    Also, just take an old phone from your drawer full of old phones, slap some free camera app on it, zip tie a car phone mount to the crib, and boom you have a free baby monitor.

    • bombcar 2 hours ago

      If you don’t have fifty to a hundred dodgy PoE cameras from Alibaba tied to the crib do you even really love the baby?