Using SQLite as storage for web server static content

(clace.io)

176 points | by ajayvk 11 hours ago ago

99 comments

  • simonw 10 hours ago

    I did some experiments around this idea a few years ago, partly inspired by the "35% Faster Than The Filesystem" article: https://www.sqlite.org/fasterthanfs.html

    Here are the notes I made at the time: https://simonwillison.net/2020/Jul/30/fun-binary-data-and-sq...

    I built https://datasette.io/plugins/datasette-media as a plugin for serving static files from SQLite via Datasette and it works fine, but honestly I've not used it much since I built it.

    A related concept is using SQLite to serve map tiles - this plugin https://datasette.io/plugins/datasette-tiles does that, using the MBTiles format which it turns out is a SQLite database full of PNGs.

    If you want to experiment with SQLite to serve files you may find my "sqlite-utils insert-files" CLI tool useful for bootstrapping the database: https://sqlite-utils.datasette.io/en/stable/cli.html#inserti...

    • ajayvk 10 hours ago

      In terms of performance, the content hash based file name approach is easier to implement with SQLite storage. The content hash has to be generated just once during file upload, not every time the web server is restarted (or using a build step which does actual file renames). It is possible to do it dynamically for file system files (see https://github.com/benbjohnson/hashfs for a embedFS implementation), but the database makes it a bit easier.

      • azornathogron 8 hours ago

        This is pretty cool, but if you're rewriting resource names server side then you need some machinery to rewrite references to those resources. I see the example you linked shows the root HTML being generated server-side from a template so content hash names can be injected into the template. Looks reasonable. Not so great for dealing with things like `import` calls in javascript (for deferred loading of script modules), if those need to be rewritten as well.

        Your JS bundler (if you use one!) might be effectively rewriting imports anyway so it can be convenient to do the content-hash-named-files rewrites in the same place. But not everyone wants to use a bundler for that.

    • snarg 8 hours ago

      Using sqlite keeps a persistent open database connection, so you only have to send requests for content that sqlite has probably cached. When reading static files, you open, read, and close a file with every request, meaning more context switches, even though the filesystem layer will have cached the file content. If you want to speed this up, the appropriate solution is to add a caching front-end, not to turn everything into a database. It will be faster than using sqlite and easier to maintain and troubleshoot.

      • eknkc 4 hours ago

        Well you can store metadata in SQLite. You can update metadata and files in a transaction. It already has a caching layer built in (I think 8mb by default, that might need some tuning for this use case). It is thread safe (I mean it is a giant rw mutex but whatever) Does not sound that bad to just use it instead of building a caching layer that might be marginally faster than sqlite. And if you restart your caching frontend thing, it will need to rebuild the cache using tons of syscalls. SQLite cache will fill up easier as it has less context switches.

        One major downside I see would be backing up the database. You can do granular backups on a filesytem. Even rsync would work fine. You’d need to basically snapshot the entire thing here.

    • nabla9 9 hours ago

      Isn't anything that avoids too many syscalls and copying between userspace and kernel 35% faster? Including filesystems that run completely on user space (not FUSE, FUSE calls go trough kernel)

      • ori_b 3 hours ago

        In some ways, it seems surprising that avoiding open/read/write syscalls and caching in user space is only 35% faster.

    • westurner 2 hours ago

      Is there a way to sendfile from a SQLite database like xsendfile?

      requests-cache caches requests in SQLite by (date,URI) IIRC. https://github.com/requests-cache/requests-cache/blob/main/r...

      /? pyfilesystem SQLite: https://www.google.com/search?q=pyfilesystem+sqlite

      /? sendfile mmap SQLite: https://www.google.com/search?q=sendfile+mmap+sqlite

      https://github.com/adamobeng/wddbfs :

      > webdavfs provider which can read the contents of sqlite databases

      There's probably a good way to implement a filesystem with Unix file permissions and xattrs extended file attribute permissions atop SQLite?

      Would SQLite be faster or more convenient than e.g. ngx_http_memcached_module.c ? Does SQLite have per-cell ACLs either?

  • azornathogron 8 hours ago

    > Transactional Updates : This is the main benefit. Updating multiple files can be done in one transaction. Isolation ensures that there are no broken webapps during the update.

    Your server on its own, whether it uses SQLite or the filesystem, cannot save you from having broken webapps during an update. Each page in the browser is a _tree_ of resources, which are fetched with separate HTTP requests and therefore not subject to your server-side transaction/atomic update system. You can change all resources transactionally on the server side but the browser can still observe a combination of old and new resources.

    The common solution here is to ensure that all sub-resources of a page (javascript bundle(s), stylesheet(s), media, etc) are named (ie, in the URL) using a content hash or version. All resources (from the root HTML document down) need to refer to the specific content-hash or versioned name, so that if the browser loads version X of the root HTML document then it will load the _corresponding version_ of all sub-resources. Not only that, but if you're updating a page from version X to version Y, all the version-X sub-resources need to remain available after you start serving page version Y, until you're confident that no browser could reasonably be still loading version X of the page, otherwise you can break page version X while someone is still loading it.

    This means you actually specifically don't want to put sub-resources for a page into one atomically-switched-out bundle along with the root HTML document, because if you do that you'll be removing the previous versions of sub-resources while they may still be referenced.

    Also of course in some cases there might be some sub-resources (e.g., perhaps some media files) that you want to be versioned "separately" from the HTML document that contains them, so that you can update them without busting caches for all of the page/app structural elements like your javascript blobs and stylesheets and so on, and you _might_ need to take that into account in your page build system as well.

    • withinboredom 7 hours ago

      > until you're confident that no browser could reasonably be still loading version X of the page

      During experiments of this stuff at a big company (which saw a large portion of the web during this time):

      - we saw most (>80%) of users staying on a web app ~2-3 days (most likely skewed from people leaving tabs open over the weekend).

      - 95% was ~2 weeks

      - 100% was about 600 day (yes, apparently we had users with a tab open for nearly 2 years)

      If you are seeking 100%, you are going to be waiting a while.

      : this is 100% from memory, I don't work there anymore.

      • azornathogron 6 hours ago

        Presumably at some point you trigger a forced client side reload to get people off old client versions?

        I'm impressed you kept the measurements going long enough to track the 600-day users though!

        • withinboredom 6 hours ago

          This is what pointed out a reason for implementing that feature!

      • keeganpoppen 33 minutes ago

        that 100%ile sounds like me...

    • ajayvk 6 hours ago

      For Hypermedia driven web apps, interactions within a page return small HTML fragments. For larger UI changes, a full page update is recommended (the Multi Page Application MPA style). The scenario of a user staying on a page for long and getting broken links is more of a SPA issue.

      I agree generally with your comment. Transactional updates prevent only one category of update related issues. There can be other issues at the app level which can cause a broken experience. Continuing to serve older versions of static content when referenced by content hash is possible, it is not currently implemented by Clace.

    • hinkley 8 hours ago

      This helps with front end caches and CDNs as well.

      The main trick is upload all of the non-html changes before the html changes, so that no file is referenced before it exists. If you want to make the app as complex as possible you do a depth first search for upload. But if you value your sanity you relax the problem and have assets-first in your app.

  • SAHChandler 7 hours ago

    Back in 2011/2012 I worked for a small game development company and at my recommendation we moved all our assets under 100kb into an sqlite3 db, and then made a "pak file" and stored offsets to said files inside the sqlite3 db (and this was done because there was a post-mortem talk given by Richard Hipp where he said in hindsight he'd wish blobs had been treated more like an inode, where they were an offset further into the database and blobs were just appended to the file)

    It was hella fast how quickly we could load our assets (this was for a mobile game so only a handful of assets were not in the db). It's been neat seeing people adopt it further.

    One aspect someone might also not realize is you can store near infinite metadata alongside your content, so you can always just query the database to find "similar" files. We threw so much metadata into the DB and I think in the end the pak file was 200MB, and the database was something like 20MB. Again, this was a mobile game. I think the worst thing we had happen on the client side was a single double inner join that we couldn't figure out how to reduce due to some server side complexity (we weren't able to implement the server, which was frustrating as the people we worked with were very bad™ at software development so our builds would break out of nowhere when they would change the entire backend spec without alerting us )

    We also used a separate sqlite3 database for game replays so you could (post match completion) replay an entire game and see what each opponent was doing during said game. This was super nice for automated testing as well.

  • samuelstros 10 hours ago

    We also came around to dumping files into SQLite instead of dealing with the filesystem (and git) for the lix change control system. This article touches on problems we encountered: https://opral.substack.com/i/150054233/breaking-git-compatib....

    - File locking, concurrency, etc., are problems that SQLite solves

    - Using SQLite allows drumroll querying files with SQL instead of platform-dependent fs APIs

    - Using SQL queries is typesafe with Kysely https://kysely.dev/ (without the need for an ORM)

    • egeozcan 9 hours ago

      > Using SQL queries is typesafe with Kysely https://kysely.dev/ (without the need for an ORM)

      Wow, this is even better than what I've seen people do with F# type providers. Cool cool cool.

      • samuelstros 9 hours ago

        the types kysely implemented are crazy. even custom where clause like this one [0] are typesafe :O

        [0] https://github.com/opral/monorepo/blob/99356e577f558f4442a95...

        • surrealize 8 hours ago

          Totally random, but I had a guess about the ts error. I hadn't seen kysely before, very cool!

            diff --git a/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts b/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
            index 7d677477e..39502f245 100644
            --- a/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
            +++ b/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
            @@ -21,10 +21,9 @@ export function isInSimulatedCurrentBranch(
                          // change is  not in a conflict
                          eb("change.id", "not in", (subquery) =>
                                  subquery.selectFrom("conflict").select("conflict.change_id").unionAll(
            -                             // @ts-expect-error - no idea why
                                          subquery
                                                  .selectFrom("conflict")
            -                                     .select("conflict.conflicting_change_id"),
            +                                     .select("conflict.conflicting_change_id as change_id"),
                                  ),
                          ),
                          // change is in a conflict that has not been resolved
          • samuelstros 8 hours ago

            You are now forever in our git history https://github.com/opral/monorepo/commit/58734e11e51d8e20092.... the ts-expect-error was indeed fixed by your suggestion

            • samuelstros 8 hours ago

              Today i learned that HN has a nesting limit. @surrealize i fixed the username https://github.com/opral/monorepo/commit/7dc1f3c806bd89c6d68...

              • aspenmayer 2 hours ago

                The conversation in some aspect may have tripped the flamewar detection, which removes the reply button iirc. You can manually reply to a specific comment by clicking/tapping on on its timestamp.

              • surrealize 8 hours ago

                Haha, thanks!

            • surrealize 8 hours ago

              Cool! I'm surrealize though, sureglymop wrote a sibling comment! I should have just made a PR, lol.

            • ec109685 8 hours ago

              Isn’t kind just overriding the type checker? Should that as be necessary?

              • surrealize 8 hours ago

                In the context of the union query, I think it makes sense. The query takes two different columns (with two different names) from the same table, and combines them together into one column. The TS error happened because the column names didn't match between the two union inputs. But they would never match without the "as".

              • samuelstros 8 hours ago

                Yes. The runtime was not affected. Hence, the ts-expect-error. Still nice to have the ts-expect-error fixed :)

    • hinkley 9 hours ago

      You can add as much metadata to the files as your little heart desires.

    • sureglymop 9 hours ago

      Regarding your lix project, have you looked at Fossil? It seems like it could maybe do what you're trying to do with some small changes.

      • samuelstros 9 hours ago

        Yes. We looked at all version control systems (fossil, pijul, jj, git, darc, sapling). None provide what we need:

        - target the browser

        - supports any file format, not just text files

        - understands changes, not just versioning of files

        - suited to build web apps on top

        - ultimately enable 1000's of artists, designers, lawyers, civil engineers, etc. to collaborate

        we are 2 years into the journey. dec 16, we'll release a public preview

        • sureglymop 4 hours ago

          Interesting! Definitely looking forward to that and I hope you succeed.

  • thisislife2 7 hours ago

    From Appropriate Uses For SQLite - https://www.sqlite.org/whentouse.html :

    > The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic ... The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time. See also: https://news.ycombinator.com/item?id=33975635

    • ajayvk 6 hours ago

      100K/day is less than 2/second. I think those numbers need to be updated on the SQLite page. For a read heavy workload, like serving static files, SQLite can do much more. Since the content caching headers are set, the browser will cache content, causing request to the server to be required for new clients only. I do not expect SQLite to be the bottleneck for most use cases.

  • theturtle32 7 hours ago

    Genuinely, I'm curious to hear from OP if you explored using filesystems that can provide de-duplication, snapshots, versioning, and compression and compared that to the SQLite approach. It would be interesting to get your take on the comparative benefits of SQLite compared with an approach that uses a more advanced filesystem like ZFS or btrfs. Once you have a filesystem that can de-dup, atomic changes are more or less a matter of building up a directory with the new version first and then swapping one directory for another with a rename. Though I assume there may be a small race condition in between the two directory rename calls. I don't know if there's a standard call to tell the filesystem "mv current old && mv new current" in one shot.

    • bhaney 5 hours ago

      > I don't know if there's a standard call to tell the filesystem "mv current old && mv new current" in one shot.

      Used to be hackily done by overwriting symlinks, but now there's https://manpages.debian.org/testing/manpages-dev/renameat2.2...

    • ajayvk 6 hours ago

      Clace is meant to be portable and easy to operationalize. It runs on Linux/macOS/Windows. The only external dependency is Docker/Podman for starting containers. Depending on particular filesystem features would make it much more difficult to setup, I did not explore that. There would be use cases where using filesystem features would be more appropriate.

      • withinboredom 5 hours ago

        If you are using Docker, then you know the target system is Linux (unless you are building windows containers). There is no reason not to rely on filesystem features.

  • UniverseHacker 9 hours ago

    I do a lot of high performance scientific computing and find the most flexible and high performance way to access data- especially in parallel- is often a read only sqlite database on a ramdisk. It feels super hacky, but it is easier to setup and faster than anything else I’ve found.

    • antognini 7 hours ago

      Honestly it doesn't really feel like a hack to me. Fast efficient parallel reads are something that SQLite was specifically designed to do.

      A friend of mine in astronomy once observed that a lot of people in the sciences should get comfortable working with databases because otherwise they just end up inadvertently spending a huge amount of effort rolling their own really crappy database.

    • edweis 8 hours ago

      We would love a write up about your findings.

      • UniverseHacker 7 hours ago

        Probably not, I’m not a CS person and just found a hacky thing that was surprisingly fast enough for my use. I didn’t do a systematic comparison with other methods.

    • jfkfif 8 hours ago

      More performant than using HDF5?

      • UniverseHacker 7 hours ago

        I doubt it- I didn’t do a systematic comparison, but stuck with this because it was fast enough that it wasn’t the bottleneck

  • p4bl0 7 hours ago

    Amusingly, my static site generator CMS does precisely the reverse of what's done here. During development/updating of the website, all pages and posts are entries in an SQLite database that is manipulated through a web interface that present an editable version of the website. And then the website is dumped as static pages to the file system to be deployed directly and/or downloaded as a zip and uploaded anywhere else for deployment (including entirely static hosting services).

    • Macha 7 hours ago

      I use Zola for now, but for a variety of reasons I'm contemplating building my own to let me tweak stuff a bit more flexibly, and this kind of approach has been on my mind. How has it been working for you?

      • p4bl0 6 hours ago

        It works great. I actually build it for a few friends initially and now I have about a hundred people using it, mostly for personal web pages (which it was designed for), but not only. I also use it to manage the websites of my uni department and research group for example, at https://informatique.up8.edu/ [website is in french].

        Source code is available here if you wanna take a look: https://code.up8.edu/pablo/cigala

        A warning: the code is not written to be pretty, it's written to be simple and effective. The initial goal being to have the whole software in a single PHP file that could even be used offline to produce a fully static website.

  • dogaar an hour ago

    This article seems to suggest that atomic application updates at the server side is by itself a solution to application versioning, which it isn’t. As long as client-side asset caching is enabled, which it is by default for all web applications, then the neatly separated application versions will get all mixed up in the browser. One solution would be to serve each application version under a different path prefix.

  • deskr 9 hours ago

    > Why this approach is not more common?

    Because file system excels at handling files. Need an atomic update? Checkout into a new directory and switch a symlink.

    I've seen various versions of using a database as a filesystem. They have a nice side to them and then a nightmare side to them when the shit hits the fan.

  • Szpadel 10 hours ago

    I don't get argumentation, swapping "current" symlink to point to another version worked for years as atomic way to swap 2 site versions

    using sqlite for static content might have other benefits, but i don't think that this is the main one

    • ajayvk 10 hours ago

      Swapping symlinks is possible. Using a database (sqlite specifically) has other benefits, like being able to do deduplication, backups are easier, compressed content can be stored, content hash can be stored etc.

      • webstrand 10 hours ago

        Sqlite isn't necessarily easier to backup than a filesystem. I've got a fairly large (~60GB) sqlite that I'm somewhat eager to get off of. If I'd stuck with pure filesystem then backing up only the changeset would be trivial, but with sqlite I have to store a new copy of the database.

        I've tried various solutions like litestream and even xdelta3 (which generates patches in excess of the actual changeset size), but I haven't found a solution I'm confident in other than backing up complete snapshots.

        • fanf2 9 hours ago

          You might like the new sqlite3_rsync command https://www.sqlite.org/rsync.html

          • simonw 9 hours ago

            Yeah that looks ideal for this exact problem, because it lets you stream a snapshot backup of a SQLite over SSH without needing to first create a duplicate copy using .backup or vacuum. My notes here: https://til.simonwillison.net/sqlite/compile-sqlite3-rsync

            • webstrand 9 hours ago

              Maybe that tool just doesn't fit my use-case, but I'm not sure how you'd use it to do incremental backups? I store all of my backups in S3 Glacier for the cheap storage, so there's nothing for me to rsync onto.

              I can see how you'd use it for replication though.

              • simonw 6 hours ago

                If you want incremental backups to S3 I recommend Litestream.

        • hedgehog 5 hours ago

          What do you do about compaction?

      • theturtle32 8 hours ago

        You could also employ a different filesystem like ZFS or btrfs in tandem with the symlink-swapping strategy to achieve things like deduplication. Or, once you have deduplication at the filesystem level, just construct a new complete duplicate of the folder to represent the new version and use renaming to swap the old for the new, and poof -- atomic changes and versioning with de-duplication, all while continuing to be able to use standard filesystem paradigms and tools.

      • borsecplata 10 hours ago

        Deduplication can be achieved the same way as in sqlite, by keeping files indexed by sha256. There are also filesystems who provide transparent compression.

        Seeing as you need some kind of layer between web and sqlite, you might as well keep a layer between web and FS who nets you most or all of the benefits.

      • warble 10 hours ago

        All of this is easily done on a filesystem too. I would assume this is a performance tradeoff rather than features?

  • MathMonkeyMan 10 hours ago

    This gives me an idea.

    Use git to manage versions of the static files.

    Use [git worktree][1] to have multiple separate working trees in different directories.

    Use [openat()][2] in your app whenever serving a request for a static file. On each request, open the directory, i.e. `/var/www/mysite.com`, and then use `openat()` with the directory's file descriptor when looking up any files under that directory.

    `/var/www/mysite.com` is a symbolic link to some working tree.

    When modifying the site, make modifications to the git repository. Then check out the modified working tree into a new directory.

    Create a symbolic link to the new directory and `mv` (rename) it to `/var/www/mysite.com`.

    Then some time later, `rm -r` the old working tree.

    [1]: https://git-scm.com/docs/git-worktree

    [2]: https://pubs.opengroup.org/onlinepubs/9799919799/functions/o...

    • mistrial9 10 hours ago

      > Use git to manage versions of the static files

      bad match -- git stores every version in the history forever. Do you really need every revision of a binary file, completely duplicated? big files, more bad

  • 101008 2 hours ago

    Can someone do a WordPress plugin that stores all the media in a sqlite? Or S3? I'm tired of downloading and uploading gigas from wp-uploads everytime I need to migrate my WPs from one hosting to another.

    There is a plugin for that but it's super expensive.

  • ajayvk 11 hours ago

    I have been building https://github.com/claceio/clace, a project which makes it easier to manage web apps for internal use (locally and across a team). SQLite is used to store static files. This post talks about the reasoning behind that.

  • fsiefken 7 hours ago

    I read "There is no equivalent implementation using the file system to compare against, so a direct benchmark test is not done." But Btrfs and ZFS have file versioning built-in, not just snapshots, but also individual files - this can be used to make a similar setup, without using binary blobs, but just the filesystem references in sqlite or hard links - which might even perform faster.

    • ajayvk 5 hours ago

      I meant there is no Clace implementation which uses the file system instead of the database. That would be required for a direct benchmark. For dev mode apps, Clace loads files from disk, but that dev mode has many differences from the prod mode

  • earthboundkid 6 hours ago

    I'm continually going through this thought process:

    1. I should make blog engine using flat files. That way I can diff the plain text.

    2. But flat files are hard to read/write structured data from, so I should use SQLite instead.

    3. But SQLite is hard to diff. GOTO 1.

    • ajayvk an hour ago

      There is a `version files` command to list the files for a version, including the SHA. I plan to add an export command also to make diff easier

    • crazygringo 5 hours ago

      In every project of mine with databases, I use a tool to export the schema (without data) as a text SQL file on every commit (as a pre-commit hook), to record any changes.

      There's no reason you can't do the same but including the content too.

      That way you're always committing a text friendly version of the database rather than the database binary encoding. And your diffs will work great.

    • sureglymop 4 hours ago

      But with sqlite would it be that hard? You could use sqldiff or, even something like this:

      diff <(sqlite3 db 'select text from posts where id = 1') <(sqlite3 db 'select text from posts where id = 2')

  • redleader55 10 hours ago

    This is interesting as a toy and I'm sure the authors are having a lot of fun implementing their idea.

    On the other hand, the state of art when it comes to performance is zero-copy from disk to NIC using io_uring and DMA, combined with NICs that support TLS & checksumming offloading and separated header and payload that can be filled independently by the OS and user-space.

    I wonder if the authors of these projects ask themselves: what reasons are there to not do it like this? This thread has a few answers.

    • ajayvk 9 hours ago

      I am the author (of the blog post and of the Clace project). Clace is built for one specific use case: internal web apps, to run locally or on a shared server, for use by a team. Clace is not a general purpose web server. It is an application server which uses containers and sandboxing (when running Starlark code) to allow for multiple apps to be managed easily. Clace just happens to implement some web server features for operational simplicity, to avoid requiring an additional component to be managed.

      I am not claiming a SQLite database is always the right approach for serving static files. In my particular use case for Clace, I found a database to work better.

  • jazzyjackson 8 hours ago

    Isn't there bottleneck at your egress - I thought sqlite was fast at reads but only handles one read at a time - so if someone is downloading a gigabyte binary out of your db every other connection just has to wait - am I wrong?

    Something I want to try is using sqlite as my filesystem but just storing content hashes that would point to an S3-compatible object store, so you get the atomicity and rollbackability and all but you also get massive parallelization and multiregionality of an object store

    Edit: I googled it again and find that multiple processes can read sqlite concurrently so shouldn't be a problem

    https://stackoverflow.com/questions/4060772/sqlite-concurren...

    • catlifeonmars 8 hours ago

      There’s still a bottleneck in that you have to host it on the same machine and egress from the same network interface. Classic tradeoff between consistency and availability though.

  • enjikaka 2 hours ago

    Time to learn about service workers

  • slavboj 7 hours ago

    "We put a filesystem in your database on your filesystem which is a database, so you can serve while you serve while you serve." [inception theme plays]

  • osigurdson 6 hours ago

    While I like the idea of using sqlite, how to avoid downtime? Do you host the file on NFS or something?

    • ajayvk an hour ago

      If you mean how do backups, there are tools like https://litestream.io/ which help with SQLite backup to S3 (or other cloud storage)

  • hinkley 8 hours ago

    I want an nginx plugin that does this. And for a cache.

  • zzzeek 10 hours ago

    > When updating an app, since there could be lots of files being updated, using a database would allow all changes to be done atomically in a transaction. This would prevent broken web pages from being served during a version change.

    but....the SQLite file is locked against reads while writing in order to achieve serializable isolation. which sort of indicates you're better off doing your database work in an offline file, then just swapping the new file for the old one that's in production. which sort of indicates....just use a tar file, or a separate directory that you swap in to update the new content.

    it is so much easier to serve static files statically rather than from some program that's trying to manage live SQLite connections and achieve some weird "update concurrency" magic when this problem is not at all that hard to solve. It's fine to manage your CMS in a sqlite database but when you serve the content live and it's static, use static files.

    • okl 10 hours ago

      > but....the SQLite file is locked against reads while writing in order to achieve serializable isolation.

      Not with WAL: https://www.sqlite.org/wal.html

      • simonw 10 hours ago

        And even without WAL (which you should absolutely be using if you're serving web content with SQLite) the lock for most writes lasts for a tiny fraction of a second.

        • resoluteteeth 5 hours ago

          I might be misremembering, but if you're using a transaction like in the article but using the rollback journal mode rather than WAL, won't sqlite actually hold the lock on the database for the entire time until the transaction is committed, which might actually be a substantial amount of time if you're writing lots of blobs like in the article even if each individual blob doesn't take that long?

        • zzzeek 9 hours ago

          small writes, which is still a dramatically larger pause than simply copying a few files to a directory and not pausing anything. if the website update is hundreds of large files, then the SQLite write is going to be large also. it then comes down to, "is it faster to copy 200M of files to a filesystem or write 200M of new data to BLOBs in a single monolithic SQLite file?" I'd bet the former in that race

  • scotty79 2 hours ago

    I'd really like to put all of the photos I have (hundreds of thousands) into a single database so they can be found and accessed quickly. Keeping them in filesystems is a terrible experience.

  • superkuh 10 hours ago

    So the entire argument here is that using sqlite to store files prevents visitors from seeing partial/mid-way changes when updating a website (a process that would take seconds)? Is that really a problem?

    Otherwise the file system is far superior for every "benefit" they list. And far easier to work with other software and more robust over time (nothing to change/break).

    • throwawayie6 10 hours ago

      I made something similar once, but simply wrote the output to a temporary file, and then just renamed it to the final name when it was done

      From what I remember, this was an atomic operation, and if there was a download in progress it would continue using the old file, because the data was still on disk and the filename was simply a pointer to the node.

      This may behave differently on other file systems. This was done on an old Linux server with ext3

      Seems like a simpler solution than using a db

    • okl 10 hours ago

      One benefit they list is storing associated metadata in the database (specifically different types of hashes are mentioned) which is not so easy with a file system.

      I think the bigger benefit though is the increased read performance on many small files (saving system call overhead). To which amount that applies to static files that a smart server might keep in cache, I don't know.

      • superkuh 9 hours ago

        I'm not sure what associated metadata is in this context but www/path/to/filename.jpg and www/path/to/filename.jpg.json would work and be very file-y. I take their/your point in it not being directly integrated though.

  • stackskipton 9 hours ago

    My SRE thought, sure, there might be some benefits for using SQLite but massive downside is you cannot easily examine what server is serving up.

    So let's go over the claims about SQLite:

    >Transactional Updates : This is the main benefit. Updating multiple files can be done in one transaction. Isolation ensures that there are no broken webapps during the update.

    Yep, big one

    >Deployment Rollbacks: Another of the transactional benefits is the ability to roll back deployment in case of errors. If multiple apps are being updated, all of them can be rolled back in one go. Rolling back a database transaction is much easier than cleaning up files on the file system.

    I guess if you don't keep old artifact of what you are serving up. We zip up our entire static site as part of pipeline and dump into Azure Storage before deployment.

    >File De-duplication Across Versions: Clace automatically versions all updates. This can lead to lots of duplicate files. The file data is stored in a table with the schema

    INTERNAL SCREAMING Deduplication gives me nightmares from cache issues and so forth. Show me SQLite level of tests here to save a few kilobytes.

    >De-duplication Across Apps : Each production app in Clace has an staging app. Apps can have multiple previews apps. This can lead to lots of duplication of files. Using the database helps avoid all the duplication. Even across apps, there are files which have the same contents. Files are de-duplicated across apps also.

    INTERNAL SCREAMING Oh god, I hope this section of the code is extremely battle tested because de-duplication at this level gives me SRE nightmares.

    >Easy Backups: Using SQLite means that backups are easy. The state of the whole system, metadata and files can be backed up easily using SQLite backup tools like Litestream.

    tar directory is difficult?

    >Content Hashing: For content caching on the browser, web servers return a ETag header. Using the database for files makes it easy to save the content SHA once during file upload without having to recompute it later.

    Most web servers handle this for you.

    >Compression: The file contents are saved Brotli compressed in the SQLite table. The database approach has the advantage that contents can be saved in multiple formats easily. GZip compressed data and uncompressed data can be added by just adding a column in the files table.

    Cool, now you have to uncompress it every time you want to serve it? Is there massive shortage on disk space and I missed the memo?

    I mean, I don't know what Clace does, seriously OP, reading your front page and I have no idea but disk space/speed is rarely a factor. Hell, most stuff probably gets loaded into RAM and is served from there at the end. Meanwhile, your content is being served out this proprietary file system and you are kind of locked in.

    • ajayvk 9 hours ago

      Great to get a SRE perspective. The de-duplication uses a SQLite table using a primary key that is the SHA256 of the file contents. There is not much code on top of that. The one complexity is knowing when it is safe to actually delete a file, since it could be referenced in multiple apps.

      Clace is built for use cases where teams want to be able to easily deploy Streamlit/Gradio/FastAPI type internal apps (python is supported out of the box, other languages work if they have a Dockerfile present). Clace adds the image building, GitOps, OAuth access control, blue-green staged deployment, secrets management etc on top. So use cases where Kubernetes and an IDP would be overkill for.

      • withinboredom 7 hours ago

        > The one complexity is knowing when it is safe to actually delete a file, since it could be referenced in multiple apps.

        If this person is anything like me, it's this "feature" that is giving them nightmares. This code has to be rock solid and impossible for anyone to mess up. It only takes one little bug and EVERY SINGLE APP GOES DOWN. Congrats on inventing a new single-point-of-failure!

        • ajayvk 6 hours ago

          If it helps, that cleanup is not implemented currently. I did not want the cleanup to be done as part of regular app update/delete. The plan is to add a vacuum/purge type of command, which expressly does this cleanup, and document that a database backup is good to have before running that.

    • duskwuff 6 hours ago

      > Cool, now you have to uncompress it every time you want to serve it?

      Not necessarily! If the client supports Brotli encoded data (Accept-Encoding: br), you can pass it straight through without decompressing. Ditto for gzip, or zstd for newer browsers.

      • ajayvk 5 hours ago

        Yes, most browsers accept Brotli compressed data, so there is no need to decompress