Aren’t polar coordinates still n-1 + 1 for radius for n-dim vector? If so I understand that angles can be quantized better but when radius r is big the error is large for highly quantized angles right? What am I missing?
“ TurboQuant, QJL, and PolarQuant are more than just practical engineering solutions; they’re fundamental algorithmic contributions backed by strong theoretical proofs. These methods don't just work well in real-world applications; they are provably efficient and operate near theoretical lower bounds.”
1. Efficient recursive transform of kv embeddings into polar coordinates
2. Quantize resulting angles without the need for explicit normalization. This saves memory via key insight: angles follow a distribution and have analytical form.
Aren’t polar coordinates still n-1 + 1 for radius for n-dim vector? If so I understand that angles can be quantized better but when radius r is big the error is large for highly quantized angles right? What am I missing?
r is a single value per vector. You don't have to quantize it, you can keep it and quantize the billion+ other coordinates of the vector.
This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.
I think it is though-
“ TurboQuant, QJL, and PolarQuant are more than just practical engineering solutions; they’re fundamental algorithmic contributions backed by strong theoretical proofs. These methods don't just work well in real-world applications; they are provably efficient and operate near theoretical lower bounds.”
Maybe they quantized a bit too much the model parameters...
I did not understand what polarQuant is.
Is is something like pattern based compression where the algorithm finds repeating patterns and creates an index of those common symbols or numbers?
1. Efficient recursive transform of kv embeddings into polar coordinates 2. Quantize resulting angles without the need for explicit normalization. This saves memory via key insight: angles follow a distribution and have analytical form.
Reminds me vaguely of Burrows-Wheeler transformations in bzip2.
https://mesuvash.github.io/blog/2026/turboquant-interactive/ has a little visualisation
I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?