High-performance 2D graphics rendering on the CPU using sparse strips [pdf]

(github.com)

100 points | by PaulHoule 3 hours ago ago

12 comments

Fascinating project. Based on section 3.9, it seems the output is in the form of a bitmap. So I assume you have to do a full memory copy to the GPU to display the image in the end. With skia moving to WebGPU[0] and with WebGPU supporting compute shaders, I feel that 2D graphics is slowly becoming a solved problem in terms of portability and performance. Of course there are cases where you would a want a CPU renderer. Interestingly the web is sort of one of them because you have to compile shaders at runtime on page load. I wonder if it could make sense in theory to have multiple stages to this, sort of like how JS JITs work, were you would start with a CPU renderer while the GPU compiles its shaders. Another benefit, as the author mentions, is binary size. WebGPU (via dawn at least) is rather large.

[0] https://blog.chromium.org/2025/07/introducing-skia-graphite-...

[-]

raphlinus 8 minutes ago

The output of this renderer is a bitmap, so you have to do an upload to GPU if that's what your environment is. As part of the larger work, we also have Vello Hybrid which does the geometry on CPU but the pixel painting on GPU.

We have definitely thought about having the CPU renderer while the shaders are being compiled (shader compilation is a problem) but haven't implemented it.

[-]

fngjdflmdflg a minute ago

In any interactive environment you have to upload to the GPU on each frame to output to a display, right? Or maybe integrated SoCs can skip that? Of course you only need to upload the dirty rects, but in the worst case the full image.

>geometry on CPU but the pixel painting on GPU

Wow. Is this akin to running just the vertex shader on the CPU?

miguel_martin an hour ago

Also checkout blaze: https://gasiulis.name/parallel-rasterization-on-cpu/

[-]

raphlinus 8 minutes ago

Thanks for the pointer, we were not actually aware of this, and the claimed benchmark numbers look really impressive.

hollowturtle an hour ago

the demo is astonishing

pixelpoet an hour ago

This looks interesting; recently I wrote some code for rendering high precision N-body paths with millions of vertices[0], I wonder if a GPU implementation this RLE representation would work well and maintain simplicity.

[0] https://www.youtube.com/watch?v=rmyA9AE3hzM

amelius an hour ago

Side question. Is there some kind of benchmark to test the correctness of renderers?

[-]

percentcer an hour ago

This was the original goal of the Cornell box (https://en.wikipedia.org/wiki/Cornell_box, i.e. carefully measure the radiosity of a simple, real-world scene and then see how closely you can come to simulating it).

For realtime rendering a common thing to do is to benchmark against a known-good offline renderer (e.g. Arnold, Octane)

embedding-shape an hour ago

Correctness of what exactly? It's a "render" of reality-like environment, so all of them make some tradeoff somewhere, and won't be 100% "correct" at least compared to reality :)

[-]

user____name 17 minutes ago

Bezier curves can generate degenerate geometry when flattened and stroke geometry has to handle edge cases. See for instance the illustration on the last page of the Polar Stroking paper: https://arxiv.org/pdf/2007.00308

There are also things like interpretting (conflating) coverage as alpha for analytical antialiasing methods, which lead to visible hairline cracks.

jmpeax an hour ago

Correctness with respect to the benchmark. A slow reference renderer could produce the target image, and renderers need to achieve either exact or close reproduction to the reference. Otherwise, you could just make substantial approximations and claim a performance victory.