QEMU with VirtIO GPU Vulkan Support

(gist.github.com)

239 points | by GalaxySnail 7 months ago ago

53 comments

  • jamesu 7 months ago

    It's nice to see support for vulkan in qemu actually getting somewhere, being able to run modern accelerated workloads inside a vm (without dealing with sr-iov) is pretty cool and definitely has some use cases.

  • jakogut 7 months ago

    Features like this are why I prefer using QEMU directly rather than an abstraction like libvirt on top of QEMU.

    Graphical interfaces like virt-manager are nice at first, but I don't need an abstraction on top of multiple hypervisors to make them all look superficially the same, because they're not. Eventually the abstraction breaks down and gets in the way.

    I need the ability to use the full capability of QEMU. I'll write a shell script to manage the complexity of the arguments. At least I don't have to deal with XML, validation, and struggling with enabling the options I want that are only supported by one specific emulator, which libvirt doesn't support, because it's not common to all of the backends.

    • exceptione 7 months ago

      How do you deal with networks?

      I like it that libvirt integrates with firewalld. libvirt via virt-manager also provides you with quick options for dns.

      My fear is that this would be a lot of wrangling with qemu before I get there. I am not fond of virt-manager, the UI is clunky, but for setting up a machine it is really helpful.

      • dijit 7 months ago

        Depends on the kind of network you want.

        Personally I'm very lazy, so I just make a virtual bridge and force QEMU to use it for everything; putting all my VMs on my local network.

        I totally understand that not everyone can do this, which is why I asked the question, I'd be interested in exploring how you would prefer the network topology to look like.

        Having a virtual network on a machine would mean having a dns/dhcp server (I think dnsmasq can actually do both by itself) for ease of use, but I think I could give you a 5 line bash script that could do basically what you want easily, depending on what it is you want.

        The normal "internal" network topology ends up giving you an outbound NAT to the local network (to, eventually, get onto the internet) which, I personally really dislike.

        • exceptione 7 months ago

          > I'd be interested in exploring how you would prefer the network topology to look like.

          I tried to highly restrict my virtual machine with just an allow list (works via firewalld), and at the same time allowing the vm to query the (physical) LAN for dns-sd.

          Tbh, I could not get the latter to work directly. I ended up letting my host function as an dns-sd reflector.

          > virtual bridge

          Does that work with wlan? libvirt creates a bridge, but with or without NAT it could not let the vm participate like a normal LAN-client. I thought it was a limitation of wireless lan bridging.

          • dbolgheroni 7 months ago

            It's possible to create a custom network for libvirt, but you have to add a static route to in the router for the other hosts in your LAN to see the VMs.

            Using virsh, you can dump the default network with net-dumpxml, which is the default bridge libvirt creates, modify it and create another network. Add the modified file with net-create (non-persistent) or net-define.

            This way the VMs can participate in the LAN and, at the same time, the LAN can see your VMs. Works with wifi and doesn't depend on having workarounds for bridging wifi and ethernet. Debian has a wiki entry on how to bridge with a wireless nic [0] but I don't think it's worth the trouble.

            [0] https://wiki.debian.org/BridgeNetworkConnections#Bridging_wi...

            • exceptione 7 months ago

              Thanks, now I remember I got stuck there because the router in question does not allow for custom routes.

              But why do you duplicate the default bridge? Wouldn't adding a route in the router + default bridge be enough for this setup to work?

              • dbolgheroni 7 months ago

                You can just use the default bridge, but still have to add a static route in the router.

    • iforgotpassword 7 months ago

      I use libvirt for qemu, because I got tired of rewriting my command line every two days because the options changed yet again.

      • stracer 7 months ago

        Yeah, why do they change options so often. They should keep some backward compatibility, qemu is not a new project.

  • throwaway48476 7 months ago

    This isn't SR-IOV which is a hardware feature for virtualizimg GPUs. The problem is the OEMs that gate this feature for enterprise products. Few people buy them so the state of the software ecosystem for virtual GPU is terrible.

    • mysteria 7 months ago

      Intel used to have GVT-g hardware virtualization on their integrated GPUs from Broadwell up. I haven't tried it myself but know people who used and liked it then. All good things come to an end though, and Intel scrapped it for Rocket Lake.

      I would've gone and bought Intel ARC dGPUs for my Proxmox cluster if they supported hardware virtualization on their consumer line.

      https://wiki.archlinux.org/title/Intel_GVT-g

      • SirGiggles 7 months ago

        12th gen and newer had some form of SR-IOV support in the i915 driver, but I'm not sure whether or not Intel fully upstreamed that.

        Here's a project that, iirc, backported and made a DKMS for from Intel's tree: https://github.com/strongtz/i915-sriov-dkms

        I also recall from that time that Intel had SR-IOV code for the iGPU (and I think their dGPUs) in the new Xe driver

      • jeroenhd 7 months ago

        My experience with GVT-g is that it mostly served as a kernel panic generator. A good idea, but the software experience just isn't stable enough.

        • throwaway48476 7 months ago

          Software takes time to mature and if almost 0 people use the feature it never will.

    • 0xcde4c3db 7 months ago

      You don't even necessarily get it with enterprise products; last time I checked, Nvidia requires additional CAL-type licenses installed on a "certified" server from the "Nvidia Partner Network", while AMD and Intel limit it to very specific GPU product lines targeted at VDI (i.e. virtualizing your employees' "desktops" in a server room a la X/Citrix terminals).

  • rafaelmn 7 months ago

    So this seems to be about enabling a Linux VM use Vulkan on a Linux host qith Vulkan support ?

  • crest 7 months ago

    At that point just run the code inside a chroot with a full /dev and call it good enough. No common GPU driver, firmware or hardware was designed to securely run really untrusted code from multiple tenants.

    • zamadatix 7 months ago

      The "Linux hosts Linux" case does seem the least interesting for that reason. I hope one day this results in actually usable acceleration of hosting a windows VM.

    • mappu 7 months ago

      WebGL / WebGPU are a somewhat safe subset. Or at least safe enough that Google will keep funding multi-million pwn2own bounties for Chrome with WebGL / WebGPU enabled.

      • sim7c00 7 months ago

        big bounties says nothing about security.

  • C-x_C-f 7 months ago

    Ignorant question—how's this different from qemu-virgl? I've been using the latter (installed from homebrew) for the last few years passing --device virtio-vga.

    • SirGiggles 7 months ago

      Virtio-GPU Venus is similar to Virgl except it passes through Vulkan commands rather than OpenGL

  • xrd 7 months ago

    Does this mean you can run cuda applications inside a qemu VM? The equivalent to --gpu=all for docker but now in an isolated VM? Is this permitting sharing of the GPU inside a VM?

    • SirGiggles 7 months ago

      I think this would depend on Virtio-GPU Native Context which, if I recall correctly from the qemu-devel mailing list, is the next natural progression from Virtio-GPU Vulkan

      Edit: Can't substantiate further, but this is what Huang Rui, the prior steward of the Venus patchset, said: https://lore.kernel.org/all/20240411102002.240536-1-dmitry.o...

      Edit 2: For further clarity, Virtio-GPU Native Context would permit running the native GPU drivers (with some modifications, minimal is what I remember being claimed) inside a VM

    • throwaway48476 7 months ago

      It's going to be significant slower than native performance. Same as VirGL.

  • doctorpangloss 7 months ago

    Does this mean graphics workloads using Vulkan can be isolated and share most GPUs securely?

    • stracer 7 months ago

      If malicious program has access to GPU directly or via some buggy interface, the whole system is at risk. There is no "safe" GPU virtualization like there is with CPUs.

    • kcb 7 months ago

      Don't think there's anything particularly secure about it.

  • shmerl 7 months ago

    Looking forward to KDE Plasma implementing Vulkan rendering and then it would run in qemu/kvm with GPU acceleration over Vulkan rather than OpenGL.

    • rescbr 7 months ago

      You can use Zink (https://docs.mesa3d.org/drivers/zink.html) to translate OpenGL to Vulkan.

      I have even used it in Windows to make a legacy proprietary OpenGL application work properly with recent Windows versions + a mobile (now unsupported) AMD GPU.

      • shmerl 7 months ago

        I use Zink for some games that rely on OpenGL since it works better with Mangohud as a Vulkan layer. For example all games that need scummvm or dosbox.

  • enoeht 7 months ago

    One still needs an extra discrete vulkan gpu for it and the other for running the OS?

    • iforgotpassword 7 months ago

      You just need any GPU with Vulkan support in the host system, which is very likely to be the case nowadays (except maybe in servers).

  • nubinetwork 7 months ago

    Someone wake me up when libvirt/virt-manager supports it, because i can't get the regular virtio gpu acceleration working either... something something spice doesn't support it...

  • 7 months ago
    [deleted]
  • Amira465485 7 months ago

    [flagged]

  • cwbriscoe 7 months ago

    Unfortunately my distro is at linux version 6.8. Looking forward to trying it out someday.

    • eptcyka 7 months ago

      Unfortunately, ZFS doesn't support anything stable beyond 6.6.

      • SirGiggles 7 months ago

        What do you mean by stable? 2.2.7 supports the 6.12 kernel if I'm not mistaken

        • hamandcheese 7 months ago

          Of course, 2.2.7, what was released checks notes 1 hour ago. So I think GP was correct at the time of their post.

          https://github.com/openzfs/zfs/releases/tag/zfs-2.2.7

          • SirGiggles 7 months ago

            Then look back to 2.2.6, it supported up to 6.10. A far cry from only supporting only up to 6.6 so I'm not seeing where they were going with with their initial statement until they define what they mean by stable.

            https://github.com/openzfs/zfs/releases/tag/zfs-2.2.6

            Edit: changed sentence to make more sense

            Edit 2: And if we are to interpret stable as in Linux LTS, then that would be 6.12 which is supported by 2.2.7 as you said

            • hamandcheese 7 months ago

              Linux kernel 6.10 is EOL.

              Non-LTS kernels very frequently go EOL before OpenZFS supports them, or there is only a very brief window that there is support for a non-EOL kernel.

              In practice, it's hard to use a non-LTS kernel with openzfs for any significant duration.

              • SirGiggles 7 months ago

                That's a fair point and I don't disagree. I guess my main point of contention was the implication that either a) ZFS wasn't stable on anything non-LTS or b) the Linux kernels themselves were unstable outside of a LTS.

                What stable means in this case is subject to individual use cases. In my case, I don't find having to wait a bit for ZFS to catch up despite being on an EOL kernel to be catastrophic, but after having some time to think, I can see why someone would need an LTS kernel.

                • hamandcheese 7 months ago

                  I think we are on the same page. To clarify: if your goal is to be on stable ZFS AND non-EOL Linux kernel, then LTS kernel is usually the only option. There may be windows where there are non-LTS-non-EOL kernels supported, but non-LTS kernels go EOL very quickly, so those windows are fleeting.

                  This impacts distributions like NixOS in particular, which have a strict policy of removing EOL kernels.

                  • SirGiggles 7 months ago

                    I wasn't aware NixOS prunes EOL kernels, thanks for letting me know; this throws a bit of wrench/damper in my personal machine plans.

                    • hamandcheese 7 months ago

                      Woah woah woah don't let me dissuade you from NixOS. I am still a happy NixOS+ZFS user, and my fingers are crossed that I'll soon get to upgrade to kernel 6.12 :)

                      • SirGiggles 7 months ago

                        No worries on that front, I expect that fun fact to be just a minor setback but I'm still pretty dead set on making my personal infrastructure declarative, reproducible, and anti-hysteresis.

                      • prmoustache 7 months ago

                        Honestly I wouldn't even try running ZFS on anything else but a distro that ship it like ubuntu or its variant or a distro with long term support like almalinux 9.

    • gpm 7 months ago

      Switch distros?

      • cwbriscoe 7 months ago

        Well 6.13 is bleeding edge, it just started it's RC cycle. I can wait until it is mainline.