Bpftune uses BPF to auto-tune Linux systems

(github.com)

110 points | by BSDobelix 5 hours ago ago

26 comments

  • gausswho 2 hours ago

    With this tool I am wary that I'll encounter system issues that are dramatically more difficult to diagnose and troubleshoot because I'll have drifted from a standard distro configuration. And in ways I'm unaware of. Is this a reasonable hesitation?

    • pbhjpbhj an hour ago

      >"bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out." (from OP GitHub readme)

      The rmem example seems to allay fears that it will make changes one can't reverse.

      • admax88qqq an hour ago

        It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.

    • trelliscoded 33 minutes ago

      If your staging doesn’t do capacity checks in excess of what production sees, yes.

    • sgarland an hour ago

      Yes, it is. IMO, except for learning (which should not be done in prod), you shouldn’t make changes that you don’t understand.

      The tools seems to mostly tweak various networking settings. You could set up a test instance with monitoring, throw load at it, and change the parameters the tool modifies (one at a time!) to see how it reacts.

      • nine_k an hour ago

        I'd run such a tool on prod in "advice mode". It should suggest the tweaks, explaining the reasoning behind them, and listing the actions necessary to implement them.

        Then humans would decide if they want to implement that as is, partly, modified, or not at all.

  • bloopernova 4 hours ago

    Fascinating!

    I'd like to hear from people who are running this. Is it effective? Worth the setup time?

  • usr1106 4 hours ago

    Interesting. But if tuning parameters to their best values were easy, shouldn't the kernel just do that in the first place?

    • RandomThoughts3 4 hours ago

      I would reverse the question: if it can be done by a BPF module, why should it be in the kernel?

      Distributions turning it on by default is another story. Maybe it deserves to be shiped on all the time but that's not the same thing as being part of the kernel.

      • jiehong 3 hours ago

        Indeed!

        The kernel might already be too monolithic.

        This kernel parameters optimisation reminds me of PGO compilation in programs.

        Yet, perhaps the kernel could come with multiple defaults config files, each being a good base for different workloads: server, embedded, laptop, mobile, database, router, etc.

    • sgarland an hour ago

      I’d rather the kernel present a good-enough but extremely stable set of configs. If I’m using a distro like Arch or Gentoo, then sure, maybe run wild (though both of those would probably assume I’m tuning them anyway), but CentOS, Debian, et al.? Stable and boring. If you change something, you’d better know what it is, and why you’re doing it.

    • onetoo 4 hours ago

      This doesn't necessarily find the best parameters, and it doesn't necessarily do it easily. From my reading, it will converge on a local optimum, and it may take some time to do that.

      In theory, I don't see why the kernel couldn't have a parameter-auto-tune similar to this. In practice, I think the kernel has to work in so many different domains, it'd be impossible to land on a "globally good enough" set of tuning heuristics.

      I'm far from a kernel developer, so I'm ready to be corrected here.

      IMO if we ever see something like this deployed widely, it will be because a popular distribution decided to install it by default.

    • nitinreddy88 4 hours ago

      It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.

      • usr1106 4 hours ago

        Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.

  • mrbluecoat 2 hours ago

    > bpftune is designed to be zero configuration; there are no options

    On behalf of every junior administrator, overworked IT admin, and security-concerned "cattle" wrangler, thank you.

    Having to learn a thousand+ knobs & dials means most will never be touched. I for one welcome automated assistance in this area, even if the results are imperfect.

    • sgarland an hour ago

      I think it’s still important to know what those dials and knobs do, otherwise (as the currently top-voted comment says) when things break, you’ll be lost.

  • gmuslera 3 hours ago

    Two words: “feedback loop”.

    That was the first idea that jumped in when thinking in what could go wrong, not because the Linux kernel, or BPF or this program, just for how it is intended to work. There might be no risk of that happening, there may be controls around that, or if they happen they might only converge to an estable state, but still it is something to have in the map.

    • marcosdumay 2 hours ago

      > or if they happen they might only converge to an stable state

      That one will always be dependent on the usage patterns. So the auto-tuner can't guarantee it.

      Also, I imagine the risk of the feedback turning positive is related to the system load (not CPU, but the usage of the resources you are optimizing). If so, it will make your computer less able to manage load. But this can still be useful for optimizing for latency.

  • nevon 3 hours ago

    I wonder how effective this would be in multi-tenant environments like shared k8s clusters. On the one hand, each application running will have a different purpose and will move around between nodes over time, but on the other hand there are likely broad similarities between most applications.

  • BSDobelix 3 hours ago

    BTW one can use it out of the box with CachyOS.

    After installation -> CachyOS Hello -> Apps/Tweaks

  • robinhoodexe 4 hours ago

    Is tuning the TCP buffer size for instance worth it?

    • viraptor 4 hours ago

      It depends. At home - probably not. On a fleet of 2000 machines where you want to keep network utilisation close to 100% with maximal throughput, and the non-optional settings translate to a not-trivial value in $ - yes.

      • londons_explore 2 hours ago

        TCP parameters are a classic example of where an autotuner might bite you in the ass...

        Imagine your tuner keeps making the congestion control more aggressive, filling network links up to 99.99% to get more data through...

        But then any other users of the network see super high latency and packet loss and fail because the tuner isn't aware of anything it isn't specifically measuring - and it's just been told to make this one application run as fast as possible.

  • bastloing 3 hours ago

    It's great how it grew out of simple packet filtering into tracing and monitoring. It's one of those great tools most should know. Been using it for years.