Started a guide to writing FUSE filesystems in Python

(gwolf.org)

262 points | by levlaz 5 days ago ago

60 comments

  • dsp_person 5 days ago

    The libfuse github has some good examples for C/C++ in [0] of increasing complexity:

    - passthrough.c mirrors existing filesystem, "Its performance is terrible."

    - passthrough_fh.c "performance is not quite as bad."

    - passthrough_ll.c implemented with low level api and "the least bad among the three"

    - passthrough_hp.cc high performance version written in C++

    Some interesting fuse projects in my notes: [1] splitting large files into segments; [2] show ZFS incremental snapshots as files; [3] transparent filesystem compression; [4] and [5] options for mounting archives as filesytems.

    - [0] https://github.com/libfuse/libfuse/tree/master/example

    - [1] https://github.com/seiferma/splitviewfuse

    - [2] https://github.com/UNFmontreal/zfs_fuse_snapshot

    - [3] https://github.com/FS-make-simple/fusecompress

    - [4] https://github.com/google/fuse-archive

    - [5] https://github.com/mxmlnkn/ratarmount

    • craftkiller 5 days ago

      While [2] might be a good code example, this functionality is already built into ZFS. At the mountpoint of every dataset is a hidden ".zfs" folder that doesn't show up, even on a `ls -A`. You just have to believe its there and cd into it. Under that is a "snapshot" folder, and inside that is a folder for each snapshot of that dataset. Those folders contain the files in the snapshot.

      So for example, /etc/hosts from my snapshot zrepl_20241011_010143_000 would be at /.zfs/snapshot/zrepl_20241011_010143_000/etc/hosts

      If you don't like the magic hidden nature of it, you can even configure it to behave like a normal folder with `zfs set snapdir=visible <dataset>`

      • dsp_person 5 days ago

        What [2] is doing is exposing the raw send stream as a file, not what the contents of the hidden .zfs folder contains.

        https://github.com/UNFmontreal/zfs_fuse_snapshot/blob/64b17b...

        • hnlmorg 4 days ago

          The GP wasn’t suggesting [2] was using the ZFS snapshot folder. It was saying that ZFS natively supports a file system via of snapshots so people should be mindful to use [2] as an example of writing FUSE file systems rather than a practical tool for working with ZFS.

    • j45 5 days ago

      I was just thinking about Fuse the other day for a project to get around case sensitivity in linux for a use case.

      Appreciate these links. I'm a little rusty on it, if anyone has any Fuse tutorials or guides they found helpful, happy to receive.

  • aargh_aargh 5 days ago

    Just wanted to throw out there that although I'm a fan of FUSE, it's not the only option. I've had fun implementing a virtual filesystem via the 9p protocol not too long ago.

    IIRC, I used py9p and the python experience was much nicer than fuse-python. You can mount a 9p service via FUSE (9pfuse) if you want. I just used the kernel v9fs client. If you're just looking to pass a filesystem through the network, I think I used the diod 9p server.

    Overall, it's a nice little ecosystem to explore.

    • packetlost 5 days ago

      9p is such a great little protocol. diod[0] has a good amount of documentation on the protocol itself, but it's pretty simple.

      I have some notes here [1], but it's mostly just linking to primary sources. FUSE is great, but 9P is more general and has high quality implementations all over the place, even in Windows!

      One thing I'm not so sure about is the performance properties of 9p. I've seen some places indicate it's rather slow, but nothing definitive. Does anyone have any benchmarks or info on that?

      [0]: https://github.com/chaos/diod/blob/master/protocol.md [1]: https://athenaeum.wiki/Zettelkasten/9p

      • magicalhippo 5 days ago

        I looked at 9p as a NFS/SMB replacement some time ago for a project, and benchmarks seems scarce. I did find this[1] set of benchmarks but it's from 2005.

        I also found this[2] from 2016 which points out some performance barriers, and it doesn't sound like an easy fix without some new extensions.

        However there's also this[3] post from 2022 about some patches suggesting a 10x improvement in Linux 9p performance.

        Would be interesting to see how things are these days.

        [1]: https://www.usenix.org/legacy/events/usenix05/tech/freenix/f...

        [2]: https://github.com/Harvey-OS/harvey/issues/18

        [3]: https://www.phoronix.com/news/Linux-9p-10x-Performance

        • me-vs-cat 4 days ago

          Did you end up using 9p as a NFS/SMB replacement?

          If yes, do you still use it today or why did you stop?

          • magicalhippo 3 days ago

            I ended up not pursuing it, but I feel like it would be interesting to give it a whirl just to compare, given hardware and code has evolved as mentioned.

      • mananaysiempre 5 days ago

        > 9P [...] has high quality implementation[...] in Windows

        Do you know if it’s possible to mount one’s own 9P servers under Windows? I seem to remember a comment from a Microsoft employee on GitHub something-or-other that said that capability is private to WSL2, but I can’t find it right now.

        • packetlost 5 days ago

          I'm not sure if you can mount a 9P filesystem from windows normally, I'll try. I'm not seeing any resources online about it either.

      • duped 4 days ago

        If you have to go through WSL to mount it, does that really count has a "high quality implementation" in Windows? Windows already has a high performance FUSE alternative called ProjFS.

    • Twirrim 5 days ago

      I was experimenting about 18 months ago with FUSE in front of an HTTPS URL, essentially a large file I wanted to be able to random read as if it was local, without downloading it first.

      One of the things I ran in to that made it painful, was that the block sizes for FUSE were really small, it made for a lot of latency and churn of HTTP calls to the back end that ended up needing some fairly complicated caching/pre-reading logic to handle. Kernel read-ahead logic never seemed to kick in (and I didn't do any investigation in to that at the time, other than not finding any particular way to induce it)

    • VMtest 5 days ago

      to note here is that fuse(virtiofs) can be used for better performance in qemu than 9p(virtio-9p)

      can read the faq in https://virtio-fs.gitlab.io

    • hathawsh 5 days ago

      It looks like py9p was last released in 2013 and it's still marked as "beta". Cool project though!

      • networked 5 days ago

        https://github.com/pbchekin/p9fs-py contains an active fork of py9p.

        When I played with 9P, this version of py9p was what worked to share a directory so a NetBSD client could mount it. It worked natively with https://man.netbsd.org/mount_9p.8 without an external client. The important difference from diod was that py9p could speak 9P2000.u, which NetBSD understood, while diod only spoke 9P2000.L.

  • iamjackg 5 days ago

    I wish I had known about this a month ago, when I had to go through the exact same process!

    In a desperate attempt to find a less frustrating way to interact with Jira, I had the silly idea of starting a jira-as-filesystem project that uses our internal issue categorization to build a tree: directories represent issues, with files representing issue fields and subdirectories for linked issues. I ended up choosing fuse-python.

    I haven't worked on it in a minute, but I was already bumping into issues (pun not intended) with the abstraction: using just the issue ID as directory name makes automation easier, but it makes it hard for humans to browse the tree, since a `ls` would just show you a bunch of inscrutable IDs. I ended up adding a parallel `<issue-type>-with-summary` directory type where the slugified summary is appended to each issue ID.

    • maicro 5 days ago

      Hmm, I'm not saying it's a good idea, but what about a daemon that keeps a symlinked version of the entire jira environment up to date? So you have one jira-as-filesystem that's the raw files, but then for human consumption/interaction, you have a tree of symlinks, including multiple links to the same file wherever it's relevant. Might be adding more layers than needed, based on my lack of understanding, but might technically solve the (current/stated) abstraction issue.

      • iamjackg 5 days ago

        That's sort of what I'm doing behind the scenes, because I keep one global list of downloaded issues (they're lazily loaded when you access them) and then the folders are really only "views" into the downloaded issues. Representing identical ones across trees as symlinks is a fantastic idea though, I can't believe I didn't think of that! Thanks for the inspiration.

      • xg15 5 days ago

        Would you even need a daemon for that? That sounds as if the FS could just generate the symlinks on-the-fly in the same way that it generates the folders.

        (Unless symlinks are somehow special - but at least both /dev and /proc also provide symlinks and to my knowledge they don't have any actual storage behind them, so it should be possible, I think)

      • inferiorhuman 5 days ago

        May as well just implement that in the FUSE driver.

      • paulddraper 5 days ago

        State syncing is always harder than state reading

    • renewiltord 5 days ago

      Referencing the same two ways is normal in Unix fs. On a modern Linux you will see disks referenced by block device and UUID. I think your approach is good and consistent with expectations.

      Though I, personally, would not use it as JIRA is complicated enough for me.

      • iamjackg 5 days ago

        Yeah, the /dev/disk/by-uuid paradigm was actually the inspiration for adding the second folder!

    • dflock 4 days ago

      There are a bunch of jirafs type things on GitHub, fwiw. Eg. https://github.com/coddingtonbear/jirafs

    • jrms 5 days ago

      Why not just 1234-human-sense? You have both type of info there and it's easy to parse too I think.

      • iamjackg 5 days ago

        Yeah, sorry, I think I was a bit confusing: that's exactly what I'm doing. For example, an Epic folder is laid out like this:

            EPIC-123
            ├─── user-stories
            │    └─── STORY-234
            └─── user-stories-with-summary
                 └─── STORY-234-add-support-for-feature-a
    • zufallsheld 4 days ago

      Any chance of open-sourcing your solution?

  • mcoliver 5 days ago

    So many fuse mount options out there with varying tradeoffs, performance, and features (s3fs, goofys, seaweed, minio, Google drive, etc..). JuiceFS is pretty interesting for doing things like mounting an object store and accessing it via posix with all the metadata you would expect on a traditional filesystem. https://juicefs.com

  • memset 5 days ago

    Nice!

    Adjacent question: lately I’ve been seeing people implement NFS base filesystems since that is a more widely supported protocol. I think rclone does this for Mac. Is there a guide, or even a comparison, for this approach?

    • mbirth 5 days ago

      Fun fact: Now that recent macOS versions require you to disable security features to install macFUSE, there’s the awesome fuse-t. It works as a drop-in replacement, doesn’t need the kext and will open up an NFS server in the background and mount that using macOS features. Performance is pretty good, too.

    • formerly_proven 5 days ago
      • inferiorhuman 5 days ago

        At least as far as Rust is concerned I preferred dealing with FUSE because I didn't have to wrestle with async (or NFS mounts).

        https://crates.io/crates/fuser

      • stateoff 5 days ago

        xetdata is retired and recently became part of hugging face. I wonder if nfsserve will be still supported.

        Are there other recommended NFS server codebases?

  • peterldowns 5 days ago

    If you're interested in seeing what a finished product looks like, check out azuline/rosé — a music manager with a virtual filesystem. Really good codebase with a lot of comments and explanations and types and tests, which should make it easy to learn from.

    https://github.com/azuline/rose

  • iod 4 days ago

    People interested in FUSE might also be interested in the CUSE companion (sub)project.

    CUSE is userspace character device driver emulation. It allows you to emulate hardware without compiling a new kernel module. I just used it recently to write a hardware device supporting IOCTLs using Python. However I didn't find any good Python libraries that worked easily and documentation was lacking, but I found it easy enough that I ended up writing it using just the ctypes ffi library. The only part that wasn't immediately intuitive for me, as someone who has never written kernel drivers, is how IOCTL's require the buffers to be resized for each read and write which means the calls come in pairs, but luckily CUSE has a debug mode which shows all bytes in and out. CUSE was originally implemented to create userspace sound devices¹ but has also been use for things like custom TTYs. I used it for creating a virtual SPI device. Hopefully someone finds this useful and this project can get more attention.

    ¹ https://lwn.net/Articles/308445/

  • JelteF 5 days ago

    Quite some years ago I created a Python FUSE filesystem[1] to to interact with dokuwiki (a wiki system).

    It's built on hde llfuse[2]. But that required implementing a bunch of low level APIs that were not really related to dokuwiki. So I created easyfuse[3][4] as a wrapper, which implemented the things that were unrelated the dokuwiki implementation. If you're interested it in building a FUSE system it might be worth looking at.

    [1]: https://github.com/JelteF/dokuwikifuse [2]: https://pypi.org/project/llfuse/ [3]: https://pypi.org/project/easyfuse/ [4]: https://github.com/JelteF/easyfuse

    • virgoerns 5 days ago

      Can you tell what's the usecase for creating FUSE for dokuwiki? Basically, dokuwiki is just a bunch of text files so wouldn't it be simpler and more efficient to e.g. mount them as NFS or share via Dropbox/Syncthing?

      • JelteF 5 days ago

        I was forced to use the dokuwiki, but I very much disliked editing stuff in the web interface. Having a filesystem interface to the wiki system allowed me to create and edit pages using vim , which I like to use for writing.

  • marmaduke 5 days ago

    I like to think of fuse as a way of allowing Makefiles to specify DAGs over arbitrary resources. For instance, a fuse fs exposing the state of a k8s cluster might ease writing operators accessible for simpler minds like mine.

    Or email, why not expose imap through a file system, so your RAG app (like gpt4all) can just access everything directly ?

  • kapnap 5 days ago

    Off topic but whenever I see a blog with some 90s/2000s vibes, I always go to their first page of posts. Never disappoints to sneak a peak into that time capsule - including gwolf.org!

  • eru 4 days ago

    I wrote a little project to expose a bar git repository in fuse. Basically, you just have your .git folder, and fuse exposes every single commit (and branch etc) as its own folder at the same time. Without actually having to checkout everything as regular files in a regular filesystem.

    It's quite nice, and really shows that git internally is already half a file system. It's also quite simple, because everything is read-only.

  • alkh 5 days ago

    I've recently discovered sshfs and learned about needing to have FUSE as a dependency for OS X, which spiked by interest. The code looks very clean and easy to understand, so thanks for that! Is there any guide/course you would recommend for the introduction to FUSE? It looks like all you have is to provide implementations to certains functions your filesystem will use but it's hard without knowing the details(ex. I wouldn't know I had to implement readdir without your code, and so on)

    • pkaye 5 days ago

      I've used sshfs in the past and I know the original authors stopped maintaining it though others took over. I did find the network error handling wasn't the greatest. Like it would unmount the fuse mount due to network error and I'd be writing files to the local mount directory silently until space filled up. Perhaps its a Linux specific issue or I've used the wrong options though.

      • alkh 5 days ago

        To be honest, I knew about the speed limitations of sshfs already, so I typically use rsync to work with large files. This way, I wouldn't write the data locally even if the connection fails. I've checked the github repo and it looks like there are a number of issues related to network timeout that hasn't been addressed for a long time[1]. However, I mostly used it on OS X, so my experience might be different from yours Thanks for the info as well, I was under the impression sshfs was under active development (: [1]https://github.com/libfuse/sshfs/issues/77

      • beeboobaa3 5 days ago

        > I'd be writing files to the local mount directory silently until space filled up

        that's why you `chattr +i` the mountpoint

    • jiggunjer 5 days ago

      The API documentation can be referenced at https://github.com/libfuse/python-fuse

      It doesn't seem to be a complex interfac

  • sweeter 5 days ago

    I did something similar and it was a really fun project! You can easily make a Google Drive FUSE fs, or something simple like an in-memory fs, an encrypted fs, etc... Its very interesting and a lot simpler than one would imagine. You basically fulfill an interface and FUSE isn't really aware of the implementation. Its more of a "contract" that X function returns a given result. You can implement a FUSE fs for a ton of cool stuff.

  • elric 5 days ago

    I attended a talk at Devoxx yesterday which showcased how new Java language features can be used to implement a Fuse filesystem. Basically as simple as generating Java bindings for Fuse using jextract on fuse.h, and then implementing a couple of method calls using the new Foreign Function & Memory API, which is set to replace JNI.

    I will link to the video when it comes online, should be later today.

  • rnd0 5 days ago

    An user-space filesystem running in an interpreted language? Is that as bad as I think it is?

    • dsp_person 5 days ago

      Imagine you want to glue together something to use with a syncing tool. ~100 lines of python can turn data from one format into a presented set of files. The application doesn't have to be a general purpose filesystem.

      e.g. https://github.com/UNFmontreal/zfs_fuse_snapshot/blob/master...

    • heavyset_go 5 days ago

      No, you will spend the majority of your time waiting on IO.

    • vineyardmike 5 days ago

      They’re a teacher. It’s learning material.

      • askvictor 5 days ago

        A operating systems class I was once a lab assistant for, we implemented the key parts of an operating system in Python. Scheduler, Filesystem, Memory Management. I think it ended up being more confusing than not, but I appreciated where it could go.