Fascinating project. Based on section 3.9, it seems the output is in the form of a bitmap. So I assume you have to do a full memory copy to the GPU to display the image in the end. With skia moving to WebGPU[0] and with WebGPU supporting compute shaders, I feel that 2D graphics is slowly becoming a solved problem in terms of portability and performance. Of course there are cases where you would a want a CPU renderer. Interestingly the web is sort of one of them because you have to compile shaders at runtime on page load. I wonder if it could make sense in theory to have multiple stages to this, sort of like how JS JITs work, were you would start with a CPU renderer while the GPU compiles its shaders. Another benefit, as the author mentions, is binary size. WebGPU (via dawn at least) is rather large.
The output of this renderer is a bitmap, so you have to do an upload to GPU if that's what your environment is. As part of the larger work, we also have Vello Hybrid which does the geometry on CPU but the pixel painting on GPU.
We have definitely thought about having the CPU renderer while the shaders are being compiled (shader compilation is a problem) but haven't implemented it.
In any interactive environment you have to upload to the GPU on each frame to output to a display, right? Or maybe integrated SoCs can skip that? Of course you only need to upload the dirty rects, but in the worst case the full image.
>geometry on CPU but the pixel painting on GPU
Wow. Is this akin to running just the vertex shader on the CPU?
This looks interesting; recently I wrote some code for rendering high precision N-body paths with millions of vertices[0], I wonder if a GPU implementation this RLE representation would work well and maintain simplicity.
This was the original goal of the Cornell box (https://en.wikipedia.org/wiki/Cornell_box, i.e. carefully measure the radiosity of a simple, real-world scene and then see how closely you can come to simulating it).
For realtime rendering a common thing to do is to benchmark against a known-good offline renderer (e.g. Arnold, Octane)
Correctness of what exactly? It's a "render" of reality-like environment, so all of them make some tradeoff somewhere, and won't be 100% "correct" at least compared to reality :)
Bezier curves can generate degenerate geometry when flattened and stroke geometry has to handle edge cases. See for instance the illustration on the last page of the Polar Stroking paper: https://arxiv.org/pdf/2007.00308
There are also things like interpretting (conflating) coverage as alpha for analytical antialiasing methods, which lead to visible hairline cracks.
Correctness with respect to the benchmark. A slow reference renderer could produce the target image, and renderers need to achieve either exact or close reproduction to the reference. Otherwise, you could just make substantial approximations and claim a performance victory.
Fascinating project. Based on section 3.9, it seems the output is in the form of a bitmap. So I assume you have to do a full memory copy to the GPU to display the image in the end. With skia moving to WebGPU[0] and with WebGPU supporting compute shaders, I feel that 2D graphics is slowly becoming a solved problem in terms of portability and performance. Of course there are cases where you would a want a CPU renderer. Interestingly the web is sort of one of them because you have to compile shaders at runtime on page load. I wonder if it could make sense in theory to have multiple stages to this, sort of like how JS JITs work, were you would start with a CPU renderer while the GPU compiles its shaders. Another benefit, as the author mentions, is binary size. WebGPU (via dawn at least) is rather large.
[0] https://blog.chromium.org/2025/07/introducing-skia-graphite-...
The output of this renderer is a bitmap, so you have to do an upload to GPU if that's what your environment is. As part of the larger work, we also have Vello Hybrid which does the geometry on CPU but the pixel painting on GPU.
We have definitely thought about having the CPU renderer while the shaders are being compiled (shader compilation is a problem) but haven't implemented it.
In any interactive environment you have to upload to the GPU on each frame to output to a display, right? Or maybe integrated SoCs can skip that? Of course you only need to upload the dirty rects, but in the worst case the full image.
>geometry on CPU but the pixel painting on GPU
Wow. Is this akin to running just the vertex shader on the CPU?
Also checkout blaze: https://gasiulis.name/parallel-rasterization-on-cpu/
Thanks for the pointer, we were not actually aware of this, and the claimed benchmark numbers look really impressive.
the demo is astonishing
This looks interesting; recently I wrote some code for rendering high precision N-body paths with millions of vertices[0], I wonder if a GPU implementation this RLE representation would work well and maintain simplicity.
[0] https://www.youtube.com/watch?v=rmyA9AE3hzM
Side question. Is there some kind of benchmark to test the correctness of renderers?
This was the original goal of the Cornell box (https://en.wikipedia.org/wiki/Cornell_box, i.e. carefully measure the radiosity of a simple, real-world scene and then see how closely you can come to simulating it).
For realtime rendering a common thing to do is to benchmark against a known-good offline renderer (e.g. Arnold, Octane)
Correctness of what exactly? It's a "render" of reality-like environment, so all of them make some tradeoff somewhere, and won't be 100% "correct" at least compared to reality :)
Bezier curves can generate degenerate geometry when flattened and stroke geometry has to handle edge cases. See for instance the illustration on the last page of the Polar Stroking paper: https://arxiv.org/pdf/2007.00308
There are also things like interpretting (conflating) coverage as alpha for analytical antialiasing methods, which lead to visible hairline cracks.
Correctness with respect to the benchmark. A slow reference renderer could produce the target image, and renderers need to achieve either exact or close reproduction to the reference. Otherwise, you could just make substantial approximations and claim a performance victory.