This is neat but Splats are not really mean to be edited in this way.
Splats are sort of like byte code, they are the compiled and optimized representation of reflected light as semi-transparent guassians.
Or you can think of them as the PDF equivalent of a Google or Word Doc. All the logic is gone, and you just have final optimized results.
Generally when you edit PDFs, the results are not great and you cannot make major edits because the layout won't reflow, etc.
So while this is cool, I don't think it will take off unless there is another innovation in terms of either using AI to "reflow" the lighting and surfaces after an edit, or inferring more directly the underlying representations (true surface properties and the light sources.)
Hi Ben! I would argue that it is very useful for splats to be edited in this way. I couldn't have built this application without SuperSplat for isolating, cleaning, transforming and optimizing/compressing the PLY:
- cleaning up noisy GuassianSplats is useful. There are often stragglers floating around in space that need to get deleted.
- compression/optimizing them is useful.
This being a cleanup and compression tool makes sense, but I guess I don't call that an "editor."
I guess I was more arguing against the idea that this is a viable "editor" where one can combine and manipulate in more radical ways Gaussian Splats. The current technological approach doesn't make this a feasible use case.
- Copy & Paste: e.g. delete a tree and fill the hole with a copied patch of grass
- Color Adjustments: tinting, brightness, etc.
If these aren't editing ops, I don't know what is. :)
Sure, you _could_ go back and recapture photogrammetry or rerun training, but that's super costly in terms of time. SuperSplat lets you make simple edits quickly and easily.
In theory if you delete something you have to recompute global illumination and remove cast shadows in the immediate environment of the removed object, but that information is baked in the gaussian splats. I think that's the kind of limitation the parent comment is talking about.
To be as accurate as possible, yes, you need to consider lighting/shadows. But trust me, in many circumstances, you can copy+paste gaussians and it looks 'good enough'. It depends on the scene and the edit you want to make.
I don't like the "byte code" analogy for Gaussian splats. If they were like that, then we could apply compiler optimization and those sorts of math techniques to them. But they are Probability Distribution Functions with transforms, so the math tools we have to work with them are the similar to those in signal processing -- resampling, quantizing, estimating, etc.
In that model, we don't compile them, we train them; we don't run them, we sample/rasterize them.
This link came up on HN before and was a great refresher/expander on the math of Guassians which allow all this. [1].
Since Gaussians can be estimated, neural networks can model/generate them. Researchers are using this for 4D work and mesh extraction. The NNs run at lower frame rate informing the 3DGS running at interactive rates.
You are right that it is ephemeral and really a weird trick of the eye and we need new ways to edit/create it. Vectors/pixels have had a lot more time to grow tooling. People are working on it, just the toolbox is different. Very cool stuff will be coming up, I bet!
I don't really think this is true. Gaussian splats certainly came from a context where an opaque representation is expected and normal, but they ended up being an entirely comprehensible format. They're not as simple to operate on as an SDF or voxel representation, but I think they're on par with triangle mesh geometry. A transformed fuzzy sphere is about as complex as a triangle, and spherical harmonic colors, while more conceptually difficult than textures, have fewer moving parts.
I guess in theory what you say could be correct, however in practice this tool has been very helpful for client work of editing, cleaning, cropping and even slight modification of Gaussian splats. I could see a similar argument for raster images in general -- they are hard to edit as you're modifying individual pixels and it's not efficient, but we've seen tools grow from MS Paint to modern Photoshop to become very useful. I think the same could be said here -- it's just early and we're at the "bytecode" level as you say.
Is it an “editor” at that point? Wouldn’t it more appropriate to call it “cleaning tool” or something similar? Given that you can’t really use it to create something from scratch, just for “touch ups”.
I found it interesting, I'd heard of guassian splats but not really appreciated how they worked, but this let me play with a model; so I'm not saying necessarily useful but instructive.
It's neat to explore how Gaussian splats represent different optical phenomena, like how the reflections seen in the eyes are represented as splats suspended within the object, with the surface of the eye being semi-transparent.
Did anyone build a text-to-splat 3D generation model? Seems like it would be pretty straightforward? Should make it really easy to generate assets for video games.
Aren't gaussian splats incompatible with most common game development styles? No shadows, no rigging/animation as main issues - or maybe I'm misunderstanding / behind on the research - please correct me.
You are right, that many features in game engines cannot be used yet, especially relighting and reflections. But there are cases where game engines (like Unreal Engine 5) are used, for example in Virtual Production with Ledscreens, where a photorealistic background is needed (3d gaussians do look more realistic and are cheaper to produce than a comparable scene made of polygons)
Supersplat is actually a game engine (but for the web)
This looks like a promising tool to (also) generate completely generative 4d Multiviews which then can be used to generate 3D-GS, their pipeline also supports animated objects, camera zoom and pan. They do benchmark their results with 3D GS.
The code is unfortunately not yet published, cant wait to try it.
https://gen-x-d.github.io/
Is there a reading guide to the maths behind Gaussian splats? All the resources I could find either assumed lots of knowledge (including what a "3D Gaussian" even is), or were written for complete lay-person (and probably includes some AI grift).
Even this article suffers from the same issues? Just in the first few paragraphs: what are spherical harmonics? I read the Wikipedia article but it all went way above my head. There really needs to be a maths-for-smart-but-not-math-phd-people resource.
Ahhhhh, I think I get Spherical Harmonics! I'll try to explain in simpler words, roughly, assuming I got them correctly (which I am not 100% sure about)... I don't guarantee it'll be ELI5 though, so it may or may not work for you...
Let's start from a single guitar string. If you pluck it, it makes a sound. It's because the string vibrates. In sound processing (a.k.a. "signal processing"), it is said we can express any complex vibration of a string (or, a sound wave) as a sum of increasingly compressed ("higher frequency") sin/cos waves (called "higher harmonics"), each of them multiplied by its "contribution" (some frequencies, a.k.a. harmonics, are more present, others less). (This is also called more generally a "Taylor polynomial" IIRC/IIUC, or a "Fourier transform" in the particular case of a wave.) Notably, a .MP3 file format takes this sum, and cuts it off at some point - assuming that if we keep only a bunch of the "most strong" harmonics, and cut away the remaining waves that are less "contributing", the audible difference won't be noticeable. Also, a guitar string has a very tiny amplitude of those vibrations compared to its length, so they are barely visible. If you take a friend and start waving a piece of rope between you, you can get bigger waves, making the amplitudes much more visible.
Now, a guitar string is a 1-dimensional wave. If we go to 2D, we get a membrane of a drum. When you hit a drum to make a sound, it will start vibrating. In the same way, the shape the membrane takes in those vibrations, can be expressed as a sum of "simpler" 2D vibrations - presumably mathy/physicsy people call them "circular harmonics" or something. Again, on a drum the vibrations aren't really visible to naked eye, but if you instead took a floppy rubber circle loosely stretched on a metal ring, and start shaking it, probably you'd get bigger waves. Interestingly, IIUC, a JPEG image is basically "MP3 but in 2D case".
Now, back to Gaussian Splats and Spherical Harmonics - I assume that "spherical harmonics" are the same thing but done to a balloon. If you pump up a ("perfectly spherical") balloon, and then hit it, presumably the vibrations of its surface can also be expressed as a sum of increasingly more wrinkled ("higher harmonics") sphere-like shapes, each one multiplied by its factor/contribution/strength/presence in the actual vibration. Again, on a balloon the deformations from ideal shape are super small; but if you imagine some really floppy balloon-like sphere floating in the air, you could imagine the wrinkles being much deeper.
I assume in case of gaussian splats, apart from storing factors of each of the spherical harmonics contributing to the final "distorted blobby balloon shape", you also probably store a color of this contribution. This way, from some angles the dominating color of the blobby balloon would look more green, from others more yellow, etc.
Interestingly and coincidentally, a similar thing happens in an atom. The various "contributions" to the "blobby balloon" shape are called "electron energy levels" (or "orbitals") IIRC (https://en.wikipedia.org/wiki/Atomic_orbital). And the actual "blobby balloon" shape is probably called an "electron cloud" IIRC. I'm super grateful you pointed me in the direction of trying to understand Spherical Harmonics, because when I saw those shapes of atomic orbitals in the past, they always seemed weird to me, and confusing. Now it seems I understand where they came from, that's super exciting!
Eheh, found one more video - building up the shape of the surface of the Earth from a sum/superimposition of increasing number of Spherical Harmonics - https://youtu.be/dDQTHFeJf5M - again roughly what an MP3 or JPEG algorithm does, depending on how much "fidelity" you choose, i.e. how many more precise harmonics you keep :)
Ah, and also - IIUC, in some other domains of math, those "harmonics" can also be said to be "eigenvalues" (https://en.wikipedia.org/wiki/Eigenvalue), and in somewhat more familiar territory, they could be called "orthogonal" meaning that a sum of them can allow to represent any shape in some space - in a similar way as orthogonal vectors of a cartesian coordinate system (i.e. the "1"s on XY axes in 2D, or on XYZ in 3D - or your green/blue/red arrows in Blender) allow to represent any point in that coordinate system.
I wish all these guides used better notation. They all use the scariest possible Greek symbols (pi epsilon feta etc) that are hard to make sense of without a degree in anthropology, instead of nicely named variables like programmers use.
Not sure if I'm missing something, but the submission title says "open-source" and in the tool's help menu there's a link to the repo (https://github.com/playcanvas/supersplat), the tool runs in your browser, there's no server involved besides a Web-Server hosting the files.
Those were the times when you could actually rely on Google to give you the right results for such a query near the top, and more importantly, give you the same results it would give to the person you're telling to Google it.
First, Kagi might give the same results today, but what about tomorrow or a year from now? Will Kagi still exist a year from now or will Kagi links all be broken?
A better idea is to follow this HN guideline here and everywhere:
> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
The kind, non-snarky response to something that could be searched for is to simply answer the question.
> First, Kagi might give the same results today, but what about tomorrow or a year from now?
When you share from Kagi you share the actual results, not the search that led to them. I believe they don't change over time. If they ever disappear, the next search engine will be just as good for that same question.
> The kind, non-snarky response to something that could be searched for is to simply answer the question.
A few times, sure. There are things that cross the line in my mind though. You do occasionally run into questions which took longer to write than it takes to check. I think it's ok to discourage them slightly while still providing the obvious answer. Lgmtfy worked great for those because you know what is and always will be the first answer for them.
The most annoying thing is when you google some question or problem, you find a forum where someone is asking your exact same question and people just tell them to google it. :D
Any tips for an app to use on iOS to capture the necessary .ply data?
Scaniverse is a great app by Niantic that can do this on-device, but it isn't very customizable and can't export its raw scanning data (exported .plys do not have the data this editor requires).
This is neat but Splats are not really mean to be edited in this way.
Splats are sort of like byte code, they are the compiled and optimized representation of reflected light as semi-transparent guassians.
Or you can think of them as the PDF equivalent of a Google or Word Doc. All the logic is gone, and you just have final optimized results.
Generally when you edit PDFs, the results are not great and you cannot make major edits because the layout won't reflow, etc.
So while this is cool, I don't think it will take off unless there is another innovation in terms of either using AI to "reflow" the lighting and surfaces after an edit, or inferring more directly the underlying representations (true surface properties and the light sources.)
Hi Ben! I would argue that it is very useful for splats to be edited in this way. I couldn't have built this application without SuperSplat for isolating, cleaning, transforming and optimizing/compressing the PLY:
https://playcanv.as/e/p/cLkf99ZV/
Integrating AI is an interesting topic and something that certainly has potential.
I 100% agree with:
- cleaning up noisy GuassianSplats is useful. There are often stragglers floating around in space that need to get deleted.
- compression/optimizing them is useful.
This being a cleanup and compression tool makes sense, but I guess I don't call that an "editor."
I guess I was more arguing against the idea that this is a viable "editor" where one can combine and manipulate in more radical ways Gaussian Splats. The current technological approach doesn't make this a feasible use case.
Coming very soon is:
- Copy & Paste: e.g. delete a tree and fill the hole with a copied patch of grass
- Color Adjustments: tinting, brightness, etc.
If these aren't editing ops, I don't know what is. :) Sure, you _could_ go back and recapture photogrammetry or rerun training, but that's super costly in terms of time. SuperSplat lets you make simple edits quickly and easily.
In theory if you delete something you have to recompute global illumination and remove cast shadows in the immediate environment of the removed object, but that information is baked in the gaussian splats. I think that's the kind of limitation the parent comment is talking about.
To be as accurate as possible, yes, you need to consider lighting/shadows. But trust me, in many circumstances, you can copy+paste gaussians and it looks 'good enough'. It depends on the scene and the edit you want to make.
And the use case!
Not that different to doing the equivalent edit in Photoshop, I'd argue. Quote often 'lical' edits are good enough
Take a picture or movie. Now edit it with image or movie editing software. You have the same problem.
Yet this comment tree thinks it’s a novel observation that makes the tool useless.
Wow, the fade-in animation is most excellent! Mind sharing how you created it?
This app was made in the PlayCanvas editor. The models were imported and a custom GLSL shader written to handle the particle animations.
I don't like the "byte code" analogy for Gaussian splats. If they were like that, then we could apply compiler optimization and those sorts of math techniques to them. But they are Probability Distribution Functions with transforms, so the math tools we have to work with them are the similar to those in signal processing -- resampling, quantizing, estimating, etc.
In that model, we don't compile them, we train them; we don't run them, we sample/rasterize them.
This link came up on HN before and was a great refresher/expander on the math of Guassians which allow all this. [1].
Since Gaussians can be estimated, neural networks can model/generate them. Researchers are using this for 4D work and mesh extraction. The NNs run at lower frame rate informing the 3DGS running at interactive rates.
You are right that it is ephemeral and really a weird trick of the eye and we need new ways to edit/create it. Vectors/pixels have had a lot more time to grow tooling. People are working on it, just the toolbox is different. Very cool stuff will be coming up, I bet!
[1] https://news.ycombinator.com/item?id=41912160 I've also re-learned Fourier transforms to appreciate similar concepts.
I don't really think this is true. Gaussian splats certainly came from a context where an opaque representation is expected and normal, but they ended up being an entirely comprehensible format. They're not as simple to operate on as an SDF or voxel representation, but I think they're on par with triangle mesh geometry. A transformed fuzzy sphere is about as complex as a triangle, and spherical harmonic colors, while more conceptually difficult than textures, have fewer moving parts.
Relightable gaussian splats - https://junxuan-li.github.io/urgca-website/
I guess in theory what you say could be correct, however in practice this tool has been very helpful for client work of editing, cleaning, cropping and even slight modification of Gaussian splats. I could see a similar argument for raster images in general -- they are hard to edit as you're modifying individual pixels and it's not efficient, but we've seen tools grow from MS Paint to modern Photoshop to become very useful. I think the same could be said here -- it's just early and we're at the "bytecode" level as you say.
That's a ridiculous take. Generated splats almost always have garbage parts that need to be truncated. An editor is absolutely needed for that.
Is it an “editor” at that point? Wouldn’t it more appropriate to call it “cleaning tool” or something similar? Given that you can’t really use it to create something from scratch, just for “touch ups”.
I'm pretty sure that's the definition of "editing". If anything I feel that you're arguing not to call it a "creator".
Another ridiculous take.
I'm not really sure what you mean. Think of SuperSplat as the photoshop of gaussian splats?
- SuperSplat dev :)
There are already approaches to infer bidirectional reflectance distribution functions (BRDFs, the "true surface properties") and lights: https://nju-3dv.github.io/projects/Relightable3DGaussian/
I found it interesting, I'd heard of guassian splats but not really appreciated how they worked, but this let me play with a model; so I'm not saying necessarily useful but instructive.
great metaphor, thanks!
Seems like 54 day old spam account from post history? A bit difficult to tell.
super low effort comments for sure.
I could imagine this as a clean-up tool for splats. In any case, beautiful interface and the sample model made me smile. Thanks for sharing.
There's an app for Quest 3 called Gracia, which allows you to see these in 3D space:
- https://www.meta.com/en-gb/experiences/gracia/25784099001234...
- https://www.gracia.ai
Whats the performance like?
Performance is surprisingly great, but of course it’s not as sharp/high res as you would get on a PC.
It's neat to explore how Gaussian splats represent different optical phenomena, like how the reflections seen in the eyes are represented as splats suspended within the object, with the surface of the eye being semi-transparent.
Did anyone build a text-to-splat 3D generation model? Seems like it would be pretty straightforward? Should make it really easy to generate assets for video games.
EDIT: yep - https://gsgen3d.github.io/
Aren't gaussian splats incompatible with most common game development styles? No shadows, no rigging/animation as main issues - or maybe I'm misunderstanding / behind on the research - please correct me.
You are right, that many features in game engines cannot be used yet, especially relighting and reflections. But there are cases where game engines (like Unreal Engine 5) are used, for example in Virtual Production with Ledscreens, where a photorealistic background is needed (3d gaussians do look more realistic and are cheaper to produce than a comparable scene made of polygons)
Supersplat is actually a game engine (but for the web)
This looks like a promising tool to (also) generate completely generative 4d Multiviews which then can be used to generate 3D-GS, their pipeline also supports animated objects, camera zoom and pan. They do benchmark their results with 3D GS. The code is unfortunately not yet published, cant wait to try it. https://gen-x-d.github.io/
Is there a reading guide to the maths behind Gaussian splats? All the resources I could find either assumed lots of knowledge (including what a "3D Gaussian" even is), or were written for complete lay-person (and probably includes some AI grift).
The blog posts from Aras are a really good starting point: https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pr...
Even this article suffers from the same issues? Just in the first few paragraphs: what are spherical harmonics? I read the Wikipedia article but it all went way above my head. There really needs to be a maths-for-smart-but-not-math-phd-people resource.
Ahhhhh, I think I get Spherical Harmonics! I'll try to explain in simpler words, roughly, assuming I got them correctly (which I am not 100% sure about)... I don't guarantee it'll be ELI5 though, so it may or may not work for you...
Let's start from a single guitar string. If you pluck it, it makes a sound. It's because the string vibrates. In sound processing (a.k.a. "signal processing"), it is said we can express any complex vibration of a string (or, a sound wave) as a sum of increasingly compressed ("higher frequency") sin/cos waves (called "higher harmonics"), each of them multiplied by its "contribution" (some frequencies, a.k.a. harmonics, are more present, others less). (This is also called more generally a "Taylor polynomial" IIRC/IIUC, or a "Fourier transform" in the particular case of a wave.) Notably, a .MP3 file format takes this sum, and cuts it off at some point - assuming that if we keep only a bunch of the "most strong" harmonics, and cut away the remaining waves that are less "contributing", the audible difference won't be noticeable. Also, a guitar string has a very tiny amplitude of those vibrations compared to its length, so they are barely visible. If you take a friend and start waving a piece of rope between you, you can get bigger waves, making the amplitudes much more visible.
Now, a guitar string is a 1-dimensional wave. If we go to 2D, we get a membrane of a drum. When you hit a drum to make a sound, it will start vibrating. In the same way, the shape the membrane takes in those vibrations, can be expressed as a sum of "simpler" 2D vibrations - presumably mathy/physicsy people call them "circular harmonics" or something. Again, on a drum the vibrations aren't really visible to naked eye, but if you instead took a floppy rubber circle loosely stretched on a metal ring, and start shaking it, probably you'd get bigger waves. Interestingly, IIUC, a JPEG image is basically "MP3 but in 2D case".
Now, back to Gaussian Splats and Spherical Harmonics - I assume that "spherical harmonics" are the same thing but done to a balloon. If you pump up a ("perfectly spherical") balloon, and then hit it, presumably the vibrations of its surface can also be expressed as a sum of increasingly more wrinkled ("higher harmonics") sphere-like shapes, each one multiplied by its factor/contribution/strength/presence in the actual vibration. Again, on a balloon the deformations from ideal shape are super small; but if you imagine some really floppy balloon-like sphere floating in the air, you could imagine the wrinkles being much deeper.
I assume in case of gaussian splats, apart from storing factors of each of the spherical harmonics contributing to the final "distorted blobby balloon shape", you also probably store a color of this contribution. This way, from some angles the dominating color of the blobby balloon would look more green, from others more yellow, etc.
Interestingly and coincidentally, a similar thing happens in an atom. The various "contributions" to the "blobby balloon" shape are called "electron energy levels" (or "orbitals") IIRC (https://en.wikipedia.org/wiki/Atomic_orbital). And the actual "blobby balloon" shape is probably called an "electron cloud" IIRC. I'm super grateful you pointed me in the direction of trying to understand Spherical Harmonics, because when I saw those shapes of atomic orbitals in the past, they always seemed weird to me, and confusing. Now it seems I understand where they came from, that's super exciting!
Found a decent video series about this on YT, showing vibrations of a plucked string, etc.: https://youtube.com/playlist?list=PLpBx-1imHuxISNflNHo0Qr4mQ...
Eheh, found one more video - building up the shape of the surface of the Earth from a sum/superimposition of increasing number of Spherical Harmonics - https://youtu.be/dDQTHFeJf5M - again roughly what an MP3 or JPEG algorithm does, depending on how much "fidelity" you choose, i.e. how many more precise harmonics you keep :)
Ah, and also - IIUC, in some other domains of math, those "harmonics" can also be said to be "eigenvalues" (https://en.wikipedia.org/wiki/Eigenvalue), and in somewhat more familiar territory, they could be called "orthogonal" meaning that a sum of them can allow to represent any shape in some space - in a similar way as orthogonal vectors of a cartesian coordinate system (i.e. the "1"s on XY axes in 2D, or on XYZ in 3D - or your green/blue/red arrows in Blender) allow to represent any point in that coordinate system.
I wish all these guides used better notation. They all use the scariest possible Greek symbols (pi epsilon feta etc) that are hard to make sense of without a degree in anthropology, instead of nicely named variables like programmers use.
I do hope they don't use brined white cheese as a symbol.
Oh, they do. They most definitely do. And salads. ∪= Feta hasn't made into Unicode yet, so I'm substituting it for generic cheese, but just you wait.
Not a reading guide, but Computerphile have a good introductory video on Gaussian Splats: https://youtu.be/VkIJbpdTujE?si=8hoMbMx6tKuMZo2S
Individualkex also has a couple videos on the high level ideas: https://youtu.be/GQXDjzNWuPc?si=zlAN7dO9STGATKad
Impressive! Does anyone know if this is open source? Or perhaps can be run locally as a server?
Not sure if I'm missing something, but the submission title says "open-source" and in the tool's help menu there's a link to the repo (https://github.com/playcanvas/supersplat), the tool runs in your browser, there's no server involved besides a Web-Server hosting the files.
I found its repository after searching google. The license is MIT.
[1] https://github.com/playcanvas/supersplat
I remember a time when it was considered unpolite to ask a question without googling first. Is it still the case?
> I remember a time when it was considered unpolite to ask a question without googling first. Is it still the case?
Yes
As I recall it, the asshole move was to reply to a question with a LMGTFY link.
Those were the times when you could actually rely on Google to give you the right results for such a query near the top, and more importantly, give you the same results it would give to the person you're telling to Google it.
Kagi allows you to share a URL to a specific search results page. Maybe it's time to revive the idea...
It's not.
First, Kagi might give the same results today, but what about tomorrow or a year from now? Will Kagi still exist a year from now or will Kagi links all be broken?
A better idea is to follow this HN guideline here and everywhere:
> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
The kind, non-snarky response to something that could be searched for is to simply answer the question.
> First, Kagi might give the same results today, but what about tomorrow or a year from now?
When you share from Kagi you share the actual results, not the search that led to them. I believe they don't change over time. If they ever disappear, the next search engine will be just as good for that same question.
> The kind, non-snarky response to something that could be searched for is to simply answer the question.
A few times, sure. There are things that cross the line in my mind though. You do occasionally run into questions which took longer to write than it takes to check. I think it's ok to discourage them slightly while still providing the obvious answer. Lgmtfy worked great for those because you know what is and always will be the first answer for them.
The most annoying thing is when you google some question or problem, you find a forum where someone is asking your exact same question and people just tell them to google it. :D
This is cool!
Any tips for an app to use on iOS to capture the necessary .ply data?
Scaniverse is a great app by Niantic that can do this on-device, but it isn't very customizable and can't export its raw scanning data (exported .plys do not have the data this editor requires).
PolyCam I think is the most popular?
Polycam or Scaniverse
It would be interesting to have a 3d version of a mesh warp / puppet warp
that awesome whats most programing language used
[dead]
is there have repo on git have please provide a link
https://github.com/playcanvas/supersplat/