Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?
Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.
These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.
I've been casually scanning huggingface for relevant models to try out but haven't really found anything.
For this exact use case I used instant-ngp[0] recently and was really pleased with the results. There's an article[1] explaining how to prepare your data.
You can never be sure what someone's real intent is. They might mean "something meshlike". Personally I usually reply by asking for more info (I always have the XY Problem in my mind) but that is time consuming and some people assume you're being pendantic (I am however correct more often than not - people have posed the wrong question or haven't given critical parts of the context)
Yeah some very impressive stuff with splats going on. But I haven't seen much about going from splats to high quality 3D meshes. I've tried one or two with pretty poor results.
No. For small objects, it is typical to use a turntable to rotate the object; there are a number of commercial and DIY turntables with an automated motion system that can trigger the shutter after a specified degree of rotation.
The OC mentioned "static lighting". If they meant static, while the platform was spinning, then the lighting would be inconsistent, because the object would change lighting with each photo. You would have to fix the lighting to the platform to spin with the object, while taking the pictures to get consistent lighting.
Photogrammetry generally assumes a fully static scene. If there are static parts of the scene which the camera can see and also rotating parts, the algorithm may struggle to properly match features between images.
COLMAP + CloudCompare with a good CUDA GPU (more VRAM is better) card will give reasonable results for large textured objects like buildings. Glass/Water/Mirror/Gloss will need coated to scan, dry spray on Dr.scholls foot deodorant seems to work fine for our object scans.
There are now more advanced options than Gaussian splatting, and these can achieve normal playback speeds rather than hours of filtering. I'll drop a citation if I recall the recent paper and example code. However, note this style of 3D scene recovery tends to be heavily 3D location dependent.
>These normally are ideal variables for photogrammetry
Actually no, my friend learned this the hard way during a photogrammetry project, he rented a photo studio, and made sure the background were perfectly black and took the photos but the photogrammetry program (Meshroom I think) was struggling to reconstruct the mesh. I did some research and I learned that it uses features in the background to help position itself to make the meshes. So he redid his tests outside with "messy" backgrounds and it worked much much better.
This was a few years ago so I don't know if things are different now.
I’m not an expert, only dabbled in photogrammetry, but it seems to me that the crux of that problem is identifying common pixels across images in order to sort of triangulate a point in the 3D space. It doesn’t sound like something an LLM would be good at.
Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.
I can only speak for myself, but a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books, that is, not at all. "Cost to zero" implies drinking directly from the AI firehose with no human in the loop (those cost money) and entertainment produced in that manner is still dire, even in the relatively mature field of pure text generation.
I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.
All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.
The issue with composition is only a problem when you rely on a pure text prompt, but has been solved for quite a while by ControlNets or img2img. What was lacking was the integration with existing art tools, but even that is getting solved, e.g. Krita[1] has a pretty nice AI plugin.
3D can be a useful intermediate when editing the 2D image, e.g. Krea has support for that[2]. But I don't think the rest of the traditional 3D pipeline is of much use here, AI image generation already produces images at a quality that traditional rendering just can't keep up with, neither in terms of speed, quality or flexibility.
Wow, those look impressive. But I think we are saying the same thing - stable diffusion can make pretty pics, but needs a lot of handholding context. I too have played around with ComfyUI, and while there are a LOT of techniques that allow you to manipulate the image, I have always felt like you were fighting SD.
In the videos you've attached, both tools (esp) the first, look impressive, but in the first example, you can clearly see that the model regenerates the street around the chameleon, when the artist changes it for no good reason.
In the second example you can see there's a bunch of AI tools under the hood, and they don't work together particularly well, with the car constantly changing as the image changes.
I think while a lot of mileage can be extracted from SD as it stands (I could think of a bunch of improvements to what was demonstrated here, by applying existing techniques ) - but the fundamental issue remains, in that Stable Diffusion was made to generate whole images at once - unlike transformers, which output a single token.
Not sure what's the image equivalent of a token is, but I'm sure it'd be feasible to train a model to fill holes - which'd be created by Segment Anything or something similar, and it would react better to local edits.
But not consistent state. The pipeline still needs to exist because most games require objects and environments to stay consistent across play sessions. That means generating from a 3D skeleton, at the very least, if not relegating genAI to production, not runtime.
I have tried the model, and I agree with you on the point. A product was uploaded for a test, the output catches the product quite well, but the text on the generated 3D model is unreadable.
IMO current generation models are capable of creating significantly better than "slop" quality content. You need only look at NotebookLM output. As models continue to improve, this will only get better. Look at the rate of improvement of video generation models in the last 12-24 months. It's obvious to me we're rapidly approaching acceptable or even excellent quality on-demand generated content.
I feel like you're conflating quality with fidelity. Video generation models have better fidelity than they did a year ago, but they are no closer to producing any kind of compelling content without a human directing them, and the latter is what you would actually need to make the "infinite entertainment machine" happen.
The fidelity of a video generation model is comparable to an LLMs ability to nail spelling and grammar - it's a start, but there's more to being an author than that.
I already feel like text models are already at sufficiently entertaining and useful quality as you define it. It's definitely possible we never get there for video or 3D modalities, but I think there are strong enough economic incentives such that big tech will dump tens of billions of dollars into achieving it.
I don't know why you think that's the case regarding text models. If that was the case, there would be articles on here that are just created by only generative AI and nobody would know the difference. It's pretty obvious that's not happening yet, not the least of which because I know what kinds of slop state-of-the-art generative models still produce when you give them open-ended prompts.
Ironic how this comment exemplifies the issue - broad claims about "slop" output but no specific examples or engagement with current architectures. Real discussions here usually reference benchmarks or implementation details.
You're sort of ignoring the issue? If the generated content was good and interesting enough on it's own, we would already have ai publishing houses pushing out entire trilogies, and each of those would be top sellers.
Generative content right now is OK. OK isn't really the goal, or what anyone wants.
I feel like this is missing the point of GenAI. I read fewer books than I did a year ago, primarily because Claude will provide dynamic content that is exactly tailored for me. I don't read many instructional books any more, because I can tell Claude what I already know about a topic and what I'd like to know and it'll create a personalised learning plan. If I don't understand something, it can re-phrase things or find different metaphors until I do. I don't read self-help books written for a general audience, because I can get personalised advice based on my specific circumstances and preferences.
The idea of a "book" is really just an artifact of a particular means of production and distribution. LLM-generated text is a categorically different thing from a book, in the same way as a bardic poem or hypertext.
First it was AI articles, raising it to entire successful book trilogies seems like a much bigger leap. Even considering the largest context windows they wouldn't directly fit and there is much less data to train context of that size on fiction than the data out there for essays and articles.
I don't think it is there yet for articles either.
My point with the Claude generated comment was maybe it could get pretty close to something like an hn comment.
NotebookLM is still slop. I recommend feeding it your resume and any other online information about you. It's kind of fun to hear the hosts butter you up, but since you know the subject well you will quickly notice that it is not faithful to the source material. It's just plausibly misleading.
I can only speak for myself, but a large and growing proportion of the text I read every day is LLM output. If Claude and Deepseek produce slop, then it's a far higher calibre of slop than most human writers could aspire to.
What kind of text are you reading? Do you work in LLM development? Or are you just noticing that many news sites are using LLMs more and more?
I've noticed obvious LLM output on low quality news sites, but I don't tend to read them anyway. Maybe all the comments I read are from LLMs and I just don't realise?
Perplexity is now my default search engine. If I'm doing research, I use LLMs to summarise documents or scan through them to find relevant excerpts. If I'm doing general background reading on something, I'll ask an LLM for an explainer; likewise if I've read about one particular thing and want to understand the broader context around it. If I'm thinking through a problem, I'll bat the idea around with Claude or Deepseek, asking them to provide alternative perspectives.
It's quite difficult to analogise because LLMs are so profoundly novel, but the best I can do is that it's like having an infinitely patient and extremely knowledgeable assistant. That assistant isn't omniscient or infallible, but it's extremely useful because it tends to provide the information that I want, presented in a way that's particularly relevant to me. That requires a certain amount of rapport-building - understanding the characteristics of various models, learning to ask good questions, guiding the model towards my preferences with customised system prompts - but the effort pays off handsomely.
Star Trek's Holodeck is actually a good case study here (especially with the recent series, Lower Decks, going as far as making two episodes that are interactive movies on a holodeck, going quite deep into how that could work in practice both in terms of producing and experiencing them).
One observation derived here is that infinite procedural content at your fingertip doesn't necessarily kill all meaning, if you bring the meaning with you. The two major use cases[0] for the holodeck are:
- Multiplayer scenarios in which you and your friends enjoy some experience in a program. The meaning is sourced from your friendship and roleplay; the program may be arbitrary output of an RNG in the global sense, but it's the same for you and your friends, so shared experience (and its importance as a social object) in your group is retained.
- Single-player simulations that are highly specific. The meaning here comes from whatever is the reason you're simulating that particular experience, and it's connection to the real world. Like idk., a flight simulator of a random space fighter flying over random world shooting at random shit would quickly get boring, but if I can get the simulator to give me a highly accurate cockpit of an F/A-18 Hornet, flying over real terrain and shooting at realistic enemies in realistic (even if fictional) storyline - now that would be deeply meaningful to me, because 1) F/A-18 Hornet is a real plane that I would otherwise never experience flying, and 2) I have a crush on this particular fighter because F/A-18 Hornet 3.0 is one of the first videogames I ever played in my life as a kid.
Now, to make Metaverse less like bullshit and more like Star Trek, we'd need to make sure the world generation is actually available to the users. No asset stores, no app marketplace bullshit. We live in a multimodal LLM era - we already have all the components to do it like Star Trek did it: "Computer, create a medieval fantasy village, in style of England around year 1400, set next to a forest, with tall mountains visible in the distance", then walk around that world and tweak the defaults from there.
--
[0] - Ignoring the third use case that's occasionally implied on the show, and that's really obvious given it's the same one the Internet is for - and I'm not talking about cat pictures.
Not all procedurally generated things are slop, and not all slop are made via procedural generation.
And popularity has nothing to do with private, subjective quality evaluations of the individual (aka, what someone calls slop might be picasso to another), but with objective, public evaluations of the product via purchases.
I was thinking about this, and the definition I came up with for slop is 'aspirational and highly detailed content that resolves its details in an uninteresting or nonsensical way'.
For example, an AI picture of a bush is not slop, because we don't expect much from a picture of a bush (not aspirational).
A hand-drawn picture of a knight in armor by an enthusiastic, but not very skilled artist is not slop either - it has tons of details that resolve in an interesting way, and what it lacks in details, it allows the viewers to fill in for themselves.
A 'realistic' knight generated by AI is slop - it contains no imaginative detail, and allows very little room for personal interpretation, and it's not rewarding to view.
Slop doesn't need to be AI - creatively bankrupt overproduced garbage counts as slop in my mind as well.
'aspirational and highly detailed content that resolves its details in an uninteresting or nonsensical way'.
This is a great definition. All the AI text I read is somehow missing the "meat" you find in good writing. All the right parts are there, but the core idea that makes me interested is just missing.
It's pretty much the same thing Linkedin has been full of for years. No one can bear to say anything controversial, so it's all just empty platitudes and junk.
Procgen has nothing to do with AI in terms of slop, for a good reason: procedural generation algorithms are heavily tuned by authors, exactly to avoid the “dull, unoriginal and repetitive” aspect that AI produces.
Object permanence and a communications channel is enough for this. Give children (who get along with each other) a pile of sticks and leave them alone for half an hour, and there's half a chance their game will ignore the sticks. Most children wouldn't want to have their play mediated by the computer in the way you describe, because the ergonomics are so poor.
I'm reminded of that guy who bought an AI enabled toy for his daughter and got increasingly exasperated as she kept turning it off and treating it as a normal toy.
That thread has a lot of good observations in it. I was probably wrong in framing the problem as "ergonomics".
> Dr. Michelle (@MichelleSaidel): I think because it takes away control from the child. Play is how children work through emotions, impulses and conflicts and well as try out new behaviors. I would think if would be super irritating to have the toy shape and control your play- like a totally dominating playmate!
> Alex Volkov (Thursd/AI) (@altryne): It did feel dominating! she wanted to make it clothes, and it was like, "meanwhile, here's another thing we can do" lacking context of what she's already doing
> The Short Straw (@short_straw): The real question you should ask yourself is why you felt compelled to turn it back on each time she turned it off.
> Angelo Angelli JD (@AngelliAngelo): Kids are pretty decent bullshit detectors and a lot of AI is bullshit.
> Foxhercules (@Foxena): […] I would like to point out again that the only things I sent this child were articulated 3d prints. beyond being able to move their arms, legs and tails, these things were made out of extruded plastic and are not exactly marvels of engineering. […] My takeaway from this is that, this is what children need. they don't need fancy with tons of bells and whistles with play on any sort of rails. And there's not a thing that AI can do to replace a Child's imagination NOR SHOULD IT.
The majority of American children have an active Roblox account. Those who don't are likely to play Minecraft or Fortnite. Play mediated by the computer in this way is already one of the most popular forms of play. Kids are going to go absolutely nuts for this and if you think otherwise, you really need to talk to some children.
I think you’re being short sighted. Imagine feeding in your favorite TV shows to a generative AI and being able to walk around in the world and talk to characters or explore it with other people.
Yes, because if someone has a tool that creates "something incredible", then everyone will be able to generate "something incredible" and then it all becomes not incredible.
It's like having god-mode in a game, it all becomes boring very quickly when you can have whatever you want.
If you follow that reasoning, anything that improves or anything that makes creation easier, produces slop.
Personally I'm not in favor of calling AI output slop, just because it is AI generated. You might then as well say that any electronic music is slop and any food prepared with help of machinery is crap. It might be crap or not, the automatedness is irrelevant.
The outputs of AI that I see today in the form of text, images or video don't look like slop to me.
> everyone will be able to generate "something incredible" and then it all becomes not incredible.
no, that's just your standard moving up.
There is an absolute scale for which you can measure, and ai is approaching a point where it is an acceptable level.
Imagine if you applied your argument to quality of life - it used to be that nobody had access to easy, cheap clean drinking water. Now everybody has access to it. Is it not an incredible achievement, rather than it not being incredible just because it is common?
That quote from the movie "the incredibles", where the villain claims that if everybody is super, then nobody is, was your gist of the argument. And it is a childish one imho.
It is equally childish to compare the engineering of our modern water and plumbing systems with the automated generation of virtual textured polygons.
People don't get tired of good clean water because we NEED it to survive.
But oh, another virtual world entirely thought up by a machine? Throw it on the pile. We're going to get bored of it, and it will quickly become not incredible.
plenty of people in the world still drink crappy water, and they survive.
You don't _need_ it, you want it, because it's much more comfortable.
But when something becomes a "need" as you described it, you think of it differently. Just like how you don't _need_ electricity to survive, but it's so ingrained that you now think of it as a need.
> We're going to get bored of it, and it will quickly become not incredible.
exactly, but i have already said this in my original post - your standards just moved up.
If I could talk to something at the level of Neuro-sama (https://www.twitch.tv/vedal987) I'd be very entertained and it's essentially a matter of time. Hell, I'd love to have something like this as an assistant application as well and I'm not a Cortana/Google Assistant/etc user.
> a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books
Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?
Probably your answer is "yes, obviously!" to all the above.
My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.
> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop
No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.
Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.
> Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that?
So while I generally agree with you, I think this was a bad example to use: a lot of these are slop, with the kind of AI sheen we've come to glaze over. I'd say less than 20% are actually artistically impressive / engaging / thought-provoking.
There's still plenty of slop in there, and it would be a better gallery of if there was a way to filter out anime girls. But it's definitely higher than 20% interesting to me.
The closest similar community of human made art is this:
Although unfortunately they've decided to allow AI art there too so it makes comparison harder. Also, I couldn't figure out how to get the equivalent list (top/year). But I'd say I find around the same amount interesting. Most human made art is slop too.
I think you fundamentally misunderstand what people use "slop" to describe.
> Most human made art is slop too.
I'm assuming you're using the term "slop" to describe low-quality, unpolished works, or works where the artist has been too ambitious with their skill level.
Let me put it this way:
Every piece of art that is made, is a series of decisions. The artist uses their lived experience, their tastes and their values to create something that's meaningful to them. Art doesn't need to have a high-level of technical expertise to be meaningful to others. It's fundamentally about communication from artists to their audience. To this point, I don't believe there's such a thing as "bad art" (all works have something to say about the artist!).
In contrast, when you prompt an image generator, you're handing over the majority of the decisions to the algorithm. You can put in your subject matter, poses, even add styles, but how much is really being communicated here? Undoubtedly it would require a high level of technical skill to render similarly by hand, but that's missing the forest for the trees- what is the image saying? There's a reason why most "good" AI-generated images generally have a lot of human curation and editing.
As a side note, here's a human-made piece that I appreciate a lot. https://i.imgur.com/AZiiZj1.jpeg
The longer you explore it, the more the story unfolds, it's quite lovely. On the other hand, when I focus on the details in AI-generated works, there's not much else to see.
> I think you fundamentally misunderstand what people use "slop" to describe.
I don't think I do, actually. It's not a term with a technical definition, but in simple terms it means art that is obviously AI, because it has the sheen, weird hands, inconsistencies, weird framing or thematic elements that are hard to describe without an art degree but which we instinctively know is wrong, or is just plain bad.
I used the term slop to describe bad humans art too, but I meant something subtly different. It's a term that has been used to describe bad work of all kinds from humans since long before there was AI.
In this case, it's art from humans who are learning what makes good art. You say there's no bad art, and it's a valid viewpoint, but I'd say bad art is when the artist has a clear goal in their mind, but they lack the skills to realize it. Nonetheless, they share it for feedback and approval anyway, and by doing that on a site like DeviantArt they learn and grow as artists. But meanwhile, to me or anyone else who is visiting that site to find "good", meaningful art made by skilled artists, this is slop. Human slop, not AI slop.
> here's a human-made piece that I appreciate a lot
I like your art. I'm glad you made it. What I like most is that it's fun to look at and think about which is what you say you intended. I hope I get to see more of your art.
> To this point, I don't believe there's such a thing as "bad art" (all works have something to say about the artist!).
As a classically trained oil painter, I know for sure there is bad art especially because I've made more than enough bad art for one lifetime.
Bad art begins with a lack of craftsmanship and is exemplified by a poor use of materials/media and forms, or a lack of knowledge of those forms (e.g. poor anatomical knowledge, misunderstanding the laws of perspective), or an overly literal representation of forms (a photograph is better at being literal, for example).
> Here's an example of some "slop" from the AI Art Turing Test […] But it's very clearly AI-generated. Can you figure out why?
It's only "clearly AI-generated" because we know that AI is capable of generating art. If you saw this without that context you wouldn't immediately say "AI!" Instead, you'd give it a normal critique that you'd give a student or colleague: I'd say:
- there's too much repetition of large forms.
- there's an unpleasant hierarchy of values and not enough separation of values.
- The portrait of the human is the focus of the image yet it has been lost in the other forms.
- The composition can improve with more breathing room in the foreground or background which are too busy.
- Here look at this Frazetta!
However, my rudimentary list could just as easily be turned into prompts to be used to refine the image and experiment with variations. And, perhaps you'd consider that to be a human making decisions?
> I like your art. I'm glad you made it. What I like most is that it's fun to look at and think about which is what you say you intended. I hope I get to see more of your art.
> There's still plenty of slop in there, and it would be a better gallery […]
Thanks for sharing your better AI gallery. It's awesome to see.
Your reply clarifies my point even better: I shared a gallery, you evaluated it and shared an even better gallery! Undoubtedly someone else will look at yours today or next year, and say, as you said, "You missed a slop! Here's a better gallery".
My point fundamentally is about basic capability of the average and even above average person. As a classically trained amateur painter, I frequently ask myself: "Can I paint a nude figure better than what you've called slop?" As I mathematician I ask: "Can I reason better than this model?"
it is a fixation based on the desire that they themselves shouldn't be rendered economically useless in the future. Then the reasoning come about post-facto from that desire, rather than from any base principle of logic.
Most, if not all, that are somewhat against the advent of AI are like the above in some way or another.
> Now show me the AI write something that's actually good on purpose
The average human can't even write a 3000 word short story that is good "on purpose" even if they tried.
I know because I've participated in many writing workshops.
The real question is: can you?
> AI can write an argument that's bad on purpose
Are you able to recognise good writing? How do I know? For all I know you're the most incompetent reader and writer on the planet. But your skills are irrelevant.
What's relevant is that deep learning is more skilled than the average person. If you're not aware of this you're either a luddite or confused about the state of the art.
The 'strawmanning your opponent' technique is a non-argument, and is effortless to pull off. Surrounding your argument with tons of purple prose (which Claude is good at) does not change that.
Writing a good argument requires 3 things: be logical, be compelling and likeable, and have a solid reputation. It does not require purple prose.
As for good writing, I'm pretty sure Brandon Sanderson's Mistborn trilogy qualifies, which was written with a rather small vocabulary and pedestrian prose, yet is universally praised.
Tbf, I do think Claude Sonnet and SD are impressive, and I think they can aid humans in producing compelling content, but they are not on the level of amateur fiction writers.
Besides, surpassing most humans in an area where most humans are unskilled is not a feat, not even AI companies flex on that.
> Writing a good argument requires 3 things: be logical, be compelling and likeable, and have a solid reputation. It does not require purple prose.
That's a common misconception that young writers have. Their prose is first purple and overwrought, then they overcorrect and try to be Hemmingway, then they master the craft and discover that form follows function.
As such, the "purpleness" of prose is not an indictment of any sort except if the style doesn't serve the substance. So yes, purple prose is sometimes required and can be used correctly, just ask James Joyce or Hitchens or remember that first sentence in Lolita, for example.
Furthermore, almost every piece of writing you've probably enjoyed went through an editor or several professional editors. You'd be shocked to read early or even late drafts.
(Also, a having "solid reputation" has f' all to do with whether you can construct a good argument. Wanting that as a prerequisite is what the cool kids used to call "appeal to authority". Anyway ...)
But wtf are we even talking about now?
> Besides, surpassing most humans in an area where most humans are unskilled is not a feat, not even AI companies flex on that.
I don't care what "AI companies flex". What I care about, as a programmer, and as an artist, and as a writer who won a tiny prize in my even tinier and insignificant niche on the planet, is what tools we can build for the average person and what tools I have access to.
If I have a robot that is 50% stronger than me or 10x better read than the average human or 20% better than the average mathematician, that's a huge victory. So yes, surpassing the average human is a feat.
But it's not merely the average human who has been surpassed: the average mathematician (skilled in mathematics) and the average artists (skilled in art) and the average writer, have all been surpassed. That is my testable claim. Play with the tools, and see for yourself.
> the fact that you are seriously asking this question says a lot about your taste.
Non sequitur. My sense taste or lack of it, is irrelevant.
Questions about "taste" don't matter when the average person doesn't have the craft to produce what they claim they are competent to judge especially when we're talking about such low hanging fruit as: "write a short story", "write an essay", "analyse this math problem", "draw an anatomically accurate portrait or nude figure", "paint this still life", "sketch this landscape".
Are you able to make the distinction between taste and craftsmanship?
Then after you are done signalling whatever it is you think you're signalling by vaguely gesturing at your undoubtedly superior sense of taste, perhaps we can talk like adults about what I asked?
Frankly i think you cannot get past your own delusion about AI and no argument will change your mind. No one can make you appreciate art properly and I can only hope one day you will.
> No one can make you appreciate art properly and I can only hope one day you will.
Lmao.
Refer to my other comment for more context, for whatever that is worth (talking with strangers who are eager to judge everyone but themselves is always weird but unavoidable online): https://news.ycombinator.com/item?id=42790853
I think it has its place.
For 'background filler' I think it makes a lot of sense; stuff which you don't need to care about, but whose absence can make something feel less real.
To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.
Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.
Jeez I'd love to know what Apple's R&D debt on Vision Pro is, based on current sales to date. I really really hope they continue to push for a headset that's within reach of average people but the hole must be so deep at this point I wouldn't be surprised if they cut their losses.
As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software. Until the "visicalc" must have killer app shows up to move the hardware, there is little incentive for general users to make the investment.
> As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software.
The third option is peoples' expectation for AR/VR itself: it could be a highly niche and expensive industry and unlikely to grow to the general population.
AR needs a bragging app.. something like the dharma/content you create in virt growing out of your footsteps in real - and why visible on cellphone, feeling more native in with AR-googles
Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA
TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
Can you elaborate on how any sort of backdoor could be hidden in the model weights?
It's a technical possibility to hide something in the code, but that would be a bit silly since there's not that much of it here. It's not technically possible to hide a backdoor in a set of numbers that are solely used as the operands to trivial mathematical operations, so I'm very curious about what sort of hidden backdoor you think is here.
When you run their demo locally, there are two places that trigger a warning that the code loads the weights unsafely. To learn more about this issue, search "pytorch model load safety issues" on Google.
I'm sure there's warnings about possibly loading code alongside the weights since they distribute the weights as pickled data, but:
1. It's trivial to go look at what's being loaded.
2. Any code that's in the distributed pickled data is not weights. The GP suggests that there are backdoors hidden in the weights which is nonsensical unless the code runs eval on the weights or something similar, which would make anyone looking at the code immediately realise it was doing something dodgy.
As an example of (1), here's all the GLOBALs in the pickled code:
GLOBAL 'collections OrderedDict'
GLOBAL 'torch HalfStorage'
GLOBAL 'torch._utils _rebuild_tensor_v2'
None of these could be used for anything malicious as far as I know.
"In production" in this case is a stand-in for "in any environment with access to sensitive stuff" which might just include GPUs, if what the attacker wanted was crypto processing grunt. Besides, if you're providing 3D asset generation as a service (which I can imagine most deployments of this sort of thing will be, at least for now) then it absolutely is running in production. The purpose of that production environment is entirely to run asset generation.
Interesting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.
To me, the bird mesh actually does look like marching cubes output. Note the abundance of almost square triangle pairs on the front and sides. Also note that marching cubes doesn't nescessarily create stairstep-like artifacts; it can generate a smooth looking mesh given signed distance field input by slightly adjusting the locations of vertices based on the relative magnitude of the field at the surrounding lattice points.
If they are using MC, does that mean they are actually generating SDFs? If so it would be nice if you could output the SDF rather than the triangle mesh.
For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)
I tried using Hunyuan3D-2 on a 4090 GPU. The Windows install encountered build errors, but it worked better on WSL Ubuntu. I first tried it with CUDA 11.3 but got a build error. Switching to CUDA 12.4 worked better. I ran it with their demo image but it reported that the mesh was too big. I removed the mesh size check and it ran fine on the 4090. It is a bit slow on my i9 14k with 128G of memory.
(I previously tried the stability 3d models: https://stability.ai/stable-3d and this seems similar in quality and speed)
The hunyuan3d-dit-v2-0 model is 4.93 GB. ComfyUI is on their roadmap, might be best to wait for that, although it doesn't look complicated to use in their example code.
As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.
I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/).
The full prompts exist but are truncated so you can just inspect the element and grab the text.
Here's what I got
Leaf
PNG: https://0x0.st/8HDL.png
GLB: https://0x0.st/8HD9.glb
Guitar
PNG: https://0x0.st/8HDf.png other view: https://0x0.st/8HDO.png
GLB: https://0x0.st/8HDV.glb
Google Translate of Guitar:
Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
PNG: https://0x0.st/8HDt.png and https://0x0.st/8HDv.png
Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole.
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.
But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)
Prompt: A guitar
PNG: https://0x0.st/8HDg.png
Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
Prompt: A Monstera leaf
PNG: https://0x0.st/8HD6.png
https://0x0.st/8HDl.png
https://0x0.st/8HDU.png
Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things.
It's definitely a leaf and monstera like but a bit of a mutant.
Prompt: Mario from Super Mario Bros
PNG: https://0x0.st/8Hkq.png
Note: Now I'm VERY suspicious....
Prompt: Luigi from Super Mario Bros
PNG: https://0x0.st/8Hkc.png
https://0x0.st/8HkT.png
https://0x0.st/8HkA.png
Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario.
Where is the tie coming from? The suspender buttons are all messed up.
Really went uncanny valley here. So this suggests we're really brittle.
Prompt: Peach from Super Mario Bros
PNG: https://0x0.st/8Hku.png
https://0x0.st/8HkM.png
Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
Prompt: Toad from Super Mario Bros
PNG: https://0x0.st/8Hke.png
https://0x0.st/8Hk_.png
https://0x0.st/8HkL.png
Note: Lord have mercy on this toad, I think it is a mutated Squirtle.
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293
(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)
[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...
Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.
Ops ran out of edit time when I was posting my last two
Prompt: A hawk flying in the sky
PNG: https://0x0.st/8Hkw.png
https://0x0.st/8Hkx.png
https://0x0.st/8Hk3.png
Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form.
Prompt: A hawk with the head of a dragon flying in the sky and holding a snake
PNG: https://0x0.st/8HkE.png
https://0x0.st/8Hk6.png
https://0x0.st/8HkI.png
https://0x0.st/8Hkl.png
Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.
This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.
Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.
Yeah, this is absolutely light years off being useful in production.
People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.
I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.
Stable Diffusion and AI in general seems to be big in marketing at least. A friend decided to abandon engineering and move to marketing and the entire social media part of his job is making a rough post, converting it to corporate marketing language via AI and then generating an eye catching piece of AI art to slap on top.
When video generation gets easy he'll probably move to making short eye catching gifs.
When 3D models and AI in general improve I can imagine him for example generating shitty little games to put in banners. I've been using an adblocker for so long I don't know what exists nowadays but I remember there being banners with "shoot 5 ducks" type games where the last duck kill opens the advertisers website. Sounds feasible for an AI to implement reliably. If you can generate different games like that based on the interests of the person seeing the ad you can probably milk some clicks.
> been around for how long, and what serious professional game developers are using it as a core part of their workflow?
Are you in the game industry? If you’re not how would you even know they have not? As someone with some connections in the industry and may soon get more involved personally, I know at least one mobile gaming studio with quite a bit of funding and momentum that has started using a good deal of AI-generated assets that would have been handcrafted in the past.
Yeah the big problem I have with my field is that there seems to be stronger incentives to be chasing benchmarks and making things look good than there is to actually solve the hard problems. There is a strong preference for "lazy evaluation" which is too dependent on assuming high levels of ethical presentation and due diligence. I find it so problematic because this focus actually makes it hard for people to publish who are tackling these problems. Because it makes the space even noisier (already incredibly noisy by the very nature of the subject) and then it becomes hard to talk about details if they're presumed solved.
I get that we gloss over details, but if there's anywhere you're allowed to be nuanced and be arguing over details should it not be in academia?
(fwiw, I'm also very supportive of having low bars to publication. If it's void of serious error and plagiarism, it is publishable imo. No one can predict what is important or impactful, so we shouldn't even play that game. Trying to decide if it is "novel" or "good enough for <Venue>" is just idiotic and breeds collusion rings and bad actors)
The first guitar has one of the strings end at the sound hole, and six tuning knobs for five strings.
The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.
It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.
Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?
Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.
These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.
I've been casually scanning huggingface for relevant models to try out but haven't really found anything.
Check out RealityCapture [1]. I think it's what's used to create the Quixel Megascans [2]. (They're both under the Epic corporate umbrella now.)
[1] https://www.capturingreality.com/realitycapture
[2] https://quixel.com/megascans/
For this exact use case I used instant-ngp[0] recently and was really pleased with the results. There's an article[1] explaining how to prepare your data.
[0] https://github.com/NVlabs/instant-ngp
[1] https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_...
Recently, a lot of development in this area has been in gaussian splatting and from what I have seen, the new methods are super effective.
https://en.wikipedia.org/wiki/Gaussian_splatting
https://www.youtube.com/watch?v=6dPBaV6M9u4
The parent explicitly asked for a mesh.
You can never be sure what someone's real intent is. They might mean "something meshlike". Personally I usually reply by asking for more info (I always have the XY Problem in my mind) but that is time consuming and some people assume you're being pendantic (I am however correct more often than not - people have posed the wrong question or haven't given critical parts of the context)
The second link I posted contains n flow from splats to meshes
Yeah some very impressive stuff with splats going on. But I haven't seen much about going from splats to high quality 3D meshes. I've tried one or two with pretty poor results.
There have been a few papers published on the topic, but it's "early days".
Expect a lot of progress over the next couple of years.
> the object was on a rotating platform
Isn't a static-object-rotating-camera basically a requirement for photogrammetry?
No. For small objects, it is typical to use a turntable to rotate the object; there are a number of commercial and DIY turntables with an automated motion system that can trigger the shutter after a specified degree of rotation.
Why would that make a difference?
The OC mentioned "static lighting". If they meant static, while the platform was spinning, then the lighting would be inconsistent, because the object would change lighting with each photo. You would have to fix the lighting to the platform to spin with the object, while taking the pictures to get consistent lighting.
Photogrammetry generally assumes a fully static scene. If there are static parts of the scene which the camera can see and also rotating parts, the algorithm may struggle to properly match features between images.
i think it's common to have dots on the rotating disk where the object is placed on.
COLMAP + CloudCompare with a good CUDA GPU (more VRAM is better) card will give reasonable results for large textured objects like buildings. Glass/Water/Mirror/Gloss will need coated to scan, dry spray on Dr.scholls foot deodorant seems to work fine for our object scans.
There are now more advanced options than Gaussian splatting, and these can achieve normal playback speeds rather than hours of filtering. I'll drop a citation if I recall the recent paper and example code. However, note this style of 3D scene recovery tends to be heavily 3D location dependent.
Best of luck, =3
>The background is solid black.
>These normally are ideal variables for photogrammetry
Actually no, my friend learned this the hard way during a photogrammetry project, he rented a photo studio, and made sure the background were perfectly black and took the photos but the photogrammetry program (Meshroom I think) was struggling to reconstruct the mesh. I did some research and I learned that it uses features in the background to help position itself to make the meshes. So he redid his tests outside with "messy" backgrounds and it worked much much better.
This was a few years ago so I don't know if things are different now.
I’m not an expert, only dabbled in photogrammetry, but it seems to me that the crux of that problem is identifying common pixels across images in order to sort of triangulate a point in the 3D space. It doesn’t sound like something an LLM would be good at.
Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.
I can only speak for myself, but a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books, that is, not at all. "Cost to zero" implies drinking directly from the AI firehose with no human in the loop (those cost money) and entertainment produced in that manner is still dire, even in the relatively mature field of pure text generation.
I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.
All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.
The issue with composition is only a problem when you rely on a pure text prompt, but has been solved for quite a while by ControlNets or img2img. What was lacking was the integration with existing art tools, but even that is getting solved, e.g. Krita[1] has a pretty nice AI plugin.
3D can be a useful intermediate when editing the 2D image, e.g. Krea has support for that[2]. But I don't think the rest of the traditional 3D pipeline is of much use here, AI image generation already produces images at a quality that traditional rendering just can't keep up with, neither in terms of speed, quality or flexibility.
[1] https://www.youtube.com/watch?v=PPxOE9YH57E
[2] https://www.youtube.com/watch?v=0ER5qfoJXd0
Wow, those look impressive. But I think we are saying the same thing - stable diffusion can make pretty pics, but needs a lot of handholding context. I too have played around with ComfyUI, and while there are a LOT of techniques that allow you to manipulate the image, I have always felt like you were fighting SD.
In the videos you've attached, both tools (esp) the first, look impressive, but in the first example, you can clearly see that the model regenerates the street around the chameleon, when the artist changes it for no good reason.
In the second example you can see there's a bunch of AI tools under the hood, and they don't work together particularly well, with the car constantly changing as the image changes.
I think while a lot of mileage can be extracted from SD as it stands (I could think of a bunch of improvements to what was demonstrated here, by applying existing techniques ) - but the fundamental issue remains, in that Stable Diffusion was made to generate whole images at once - unlike transformers, which output a single token.
Not sure what's the image equivalent of a token is, but I'm sure it'd be feasible to train a model to fill holes - which'd be created by Segment Anything or something similar, and it would react better to local edits.
But not consistent state. The pipeline still needs to exist because most games require objects and environments to stay consistent across play sessions. That means generating from a 3D skeleton, at the very least, if not relegating genAI to production, not runtime.
I have tried the model, and I agree with you on the point. A product was uploaded for a test, the output catches the product quite well, but the text on the generated 3D model is unreadable.
IMO current generation models are capable of creating significantly better than "slop" quality content. You need only look at NotebookLM output. As models continue to improve, this will only get better. Look at the rate of improvement of video generation models in the last 12-24 months. It's obvious to me we're rapidly approaching acceptable or even excellent quality on-demand generated content.
I feel like you're conflating quality with fidelity. Video generation models have better fidelity than they did a year ago, but they are no closer to producing any kind of compelling content without a human directing them, and the latter is what you would actually need to make the "infinite entertainment machine" happen.
The fidelity of a video generation model is comparable to an LLMs ability to nail spelling and grammar - it's a start, but there's more to being an author than that.
I already feel like text models are already at sufficiently entertaining and useful quality as you define it. It's definitely possible we never get there for video or 3D modalities, but I think there are strong enough economic incentives such that big tech will dump tens of billions of dollars into achieving it.
I don't know why you think that's the case regarding text models. If that was the case, there would be articles on here that are just created by only generative AI and nobody would know the difference. It's pretty obvious that's not happening yet, not the least of which because I know what kinds of slop state-of-the-art generative models still produce when you give them open-ended prompts.
Ironic how this comment exemplifies the issue - broad claims about "slop" output but no specific examples or engagement with current architectures. Real discussions here usually reference benchmarks or implementation details.
(from Claude)
You're sort of ignoring the issue? If the generated content was good and interesting enough on it's own, we would already have ai publishing houses pushing out entire trilogies, and each of those would be top sellers.
Generative content right now is OK. OK isn't really the goal, or what anyone wants.
I feel like this is missing the point of GenAI. I read fewer books than I did a year ago, primarily because Claude will provide dynamic content that is exactly tailored for me. I don't read many instructional books any more, because I can tell Claude what I already know about a topic and what I'd like to know and it'll create a personalised learning plan. If I don't understand something, it can re-phrase things or find different metaphors until I do. I don't read self-help books written for a general audience, because I can get personalised advice based on my specific circumstances and preferences.
The idea of a "book" is really just an artifact of a particular means of production and distribution. LLM-generated text is a categorically different thing from a book, in the same way as a bardic poem or hypertext.
First it was AI articles, raising it to entire successful book trilogies seems like a much bigger leap. Even considering the largest context windows they wouldn't directly fit and there is much less data to train context of that size on fiction than the data out there for essays and articles.
I don't think it is there yet for articles either.
My point with the Claude generated comment was maybe it could get pretty close to something like an hn comment.
I'm sorry I didn't meet the LLMs expectations, but whether something is subjectively entertaining or not can't be exemplified by objective benchmarks.
NotebookLM is still slop. I recommend feeding it your resume and any other online information about you. It's kind of fun to hear the hosts butter you up, but since you know the subject well you will quickly notice that it is not faithful to the source material. It's just plausibly misleading.
This is exactly while I'm building my app now with the expectation that these assets will be exponentially better in the short term.
I can only speak for myself, but a large and growing proportion of the text I read every day is LLM output. If Claude and Deepseek produce slop, then it's a far higher calibre of slop than most human writers could aspire to.
What kind of text are you reading? Do you work in LLM development? Or are you just noticing that many news sites are using LLMs more and more?
I've noticed obvious LLM output on low quality news sites, but I don't tend to read them anyway. Maybe all the comments I read are from LLMs and I just don't realise?
Perplexity is now my default search engine. If I'm doing research, I use LLMs to summarise documents or scan through them to find relevant excerpts. If I'm doing general background reading on something, I'll ask an LLM for an explainer; likewise if I've read about one particular thing and want to understand the broader context around it. If I'm thinking through a problem, I'll bat the idea around with Claude or Deepseek, asking them to provide alternative perspectives.
It's quite difficult to analogise because LLMs are so profoundly novel, but the best I can do is that it's like having an infinitely patient and extremely knowledgeable assistant. That assistant isn't omniscient or infallible, but it's extremely useful because it tends to provide the information that I want, presented in a way that's particularly relevant to me. That requires a certain amount of rapport-building - understanding the characteristics of various models, learning to ask good questions, guiding the model towards my preferences with customised system prompts - but the effort pays off handsomely.
Screw Metaverse. Let's make a VR holodeck.
Star Trek's Holodeck is actually a good case study here (especially with the recent series, Lower Decks, going as far as making two episodes that are interactive movies on a holodeck, going quite deep into how that could work in practice both in terms of producing and experiencing them).
One observation derived here is that infinite procedural content at your fingertip doesn't necessarily kill all meaning, if you bring the meaning with you. The two major use cases[0] for the holodeck are:
- Multiplayer scenarios in which you and your friends enjoy some experience in a program. The meaning is sourced from your friendship and roleplay; the program may be arbitrary output of an RNG in the global sense, but it's the same for you and your friends, so shared experience (and its importance as a social object) in your group is retained.
- Single-player simulations that are highly specific. The meaning here comes from whatever is the reason you're simulating that particular experience, and it's connection to the real world. Like idk., a flight simulator of a random space fighter flying over random world shooting at random shit would quickly get boring, but if I can get the simulator to give me a highly accurate cockpit of an F/A-18 Hornet, flying over real terrain and shooting at realistic enemies in realistic (even if fictional) storyline - now that would be deeply meaningful to me, because 1) F/A-18 Hornet is a real plane that I would otherwise never experience flying, and 2) I have a crush on this particular fighter because F/A-18 Hornet 3.0 is one of the first videogames I ever played in my life as a kid.
Now, to make Metaverse less like bullshit and more like Star Trek, we'd need to make sure the world generation is actually available to the users. No asset stores, no app marketplace bullshit. We live in a multimodal LLM era - we already have all the components to do it like Star Trek did it: "Computer, create a medieval fantasy village, in style of England around year 1400, set next to a forest, with tall mountains visible in the distance", then walk around that world and tweak the defaults from there.
--
[0] - Ignoring the third use case that's occasionally implied on the show, and that's really obvious given it's the same one the Internet is for - and I'm not talking about cat pictures.
> I'm not talking about cat pictures
Caitian pictures, on the other hand…
I think they were more than implying what T'Ana got up to with Shaxs.
Minecraft is procedurally generated slop, yet it's insanely popular.
Not all procedurally generated things are slop, and not all slop are made via procedural generation.
And popularity has nothing to do with private, subjective quality evaluations of the individual (aka, what someone calls slop might be picasso to another), but with objective, public evaluations of the product via purchases.
What is your definition of slop?
I was thinking about this, and the definition I came up with for slop is 'aspirational and highly detailed content that resolves its details in an uninteresting or nonsensical way'.
For example, an AI picture of a bush is not slop, because we don't expect much from a picture of a bush (not aspirational).
A hand-drawn picture of a knight in armor by an enthusiastic, but not very skilled artist is not slop either - it has tons of details that resolve in an interesting way, and what it lacks in details, it allows the viewers to fill in for themselves.
A 'realistic' knight generated by AI is slop - it contains no imaginative detail, and allows very little room for personal interpretation, and it's not rewarding to view.
Slop doesn't need to be AI - creatively bankrupt overproduced garbage counts as slop in my mind as well.
'aspirational and highly detailed content that resolves its details in an uninteresting or nonsensical way'.
This is a great definition. All the AI text I read is somehow missing the "meat" you find in good writing. All the right parts are there, but the core idea that makes me interested is just missing.
It's pretty much the same thing Linkedin has been full of for years. No one can bear to say anything controversial, so it's all just empty platitudes and junk.
Put simply, AI generation automates "design by committee."
Old men looking at a child with training wheels and sneering that they’ll never ever be able to ride a bike.
Procgen has nothing to do with AI in terms of slop, for a good reason: procedural generation algorithms are heavily tuned by authors, exactly to avoid the “dull, unoriginal and repetitive” aspect that AI produces.
You're too old and jaded [1]. It's for kids inventing infinite worlds to role play and adventure. They're going to have a blast.
[1] Not meant as an insult. Working professionals don't have time for this stuff.
> Working professionals don't have time for this stuff.
Why don't working professionals have time for entertainment?
And are working people not always professionals?
Object permanence and a communications channel is enough for this. Give children (who get along with each other) a pile of sticks and leave them alone for half an hour, and there's half a chance their game will ignore the sticks. Most children wouldn't want to have their play mediated by the computer in the way you describe, because the ergonomics are so poor.
I'm reminded of that guy who bought an AI enabled toy for his daughter and got increasingly exasperated as she kept turning it off and treating it as a normal toy.
https://xcancel.com/altryne/status/1872090523420229780
That thread has a lot of good observations in it. I was probably wrong in framing the problem as "ergonomics".
> Dr. Michelle (@MichelleSaidel): I think because it takes away control from the child. Play is how children work through emotions, impulses and conflicts and well as try out new behaviors. I would think if would be super irritating to have the toy shape and control your play- like a totally dominating playmate!
> Alex Volkov (Thursd/AI) (@altryne): It did feel dominating! she wanted to make it clothes, and it was like, "meanwhile, here's another thing we can do" lacking context of what she's already doing
> The Short Straw (@short_straw): The real question you should ask yourself is why you felt compelled to turn it back on each time she turned it off.
> Angelo Angelli JD (@AngelliAngelo): Kids are pretty decent bullshit detectors and a lot of AI is bullshit.
> Foxhercules (@Foxena): […] I would like to point out again that the only things I sent this child were articulated 3d prints. beyond being able to move their arms, legs and tails, these things were made out of extruded plastic and are not exactly marvels of engineering. […] My takeaway from this is that, this is what children need. they don't need fancy with tons of bells and whistles with play on any sort of rails. And there's not a thing that AI can do to replace a Child's imagination NOR SHOULD IT.
The majority of American children have an active Roblox account. Those who don't are likely to play Minecraft or Fortnite. Play mediated by the computer in this way is already one of the most popular forms of play. Kids are going to go absolutely nuts for this and if you think otherwise, you really need to talk to some children.
It worked for Minecraft.
It was rough at first, and needed plenty of tuning, but the terrain and environments it's capable of certainly have a wide audience.
But as far as pure, unbridled generation goes, yeah; I'm sure there will be plenty of slop made in the coming decade.
I think you’re being short sighted. Imagine feeding in your favorite TV shows to a generative AI and being able to walk around in the world and talk to characters or explore it with other people.
That's still AI slop, in my opinion.
About 4 years ago the best example of generative AI for non-moving images was a fuzzy cartoon of an Avocado Man walking a pet Hedgehog.
If AI videos feel like slop right now, just wait another 4 years and see what happens
Everything will be AI slop to you.
There will never be a point where AI creates something incredible and you are like wow I prefer this AI stuff over human made slop.
Yes, because if someone has a tool that creates "something incredible", then everyone will be able to generate "something incredible" and then it all becomes not incredible.
It's like having god-mode in a game, it all becomes boring very quickly when you can have whatever you want.
If you follow that reasoning, anything that improves or anything that makes creation easier, produces slop.
Personally I'm not in favor of calling AI output slop, just because it is AI generated. You might then as well say that any electronic music is slop and any food prepared with help of machinery is crap. It might be crap or not, the automatedness is irrelevant.
The outputs of AI that I see today in the form of text, images or video don't look like slop to me.
> everyone will be able to generate "something incredible" and then it all becomes not incredible.
no, that's just your standard moving up.
There is an absolute scale for which you can measure, and ai is approaching a point where it is an acceptable level.
Imagine if you applied your argument to quality of life - it used to be that nobody had access to easy, cheap clean drinking water. Now everybody has access to it. Is it not an incredible achievement, rather than it not being incredible just because it is common?
That quote from the movie "the incredibles", where the villain claims that if everybody is super, then nobody is, was your gist of the argument. And it is a childish one imho.
It is equally childish to compare the engineering of our modern water and plumbing systems with the automated generation of virtual textured polygons.
People don't get tired of good clean water because we NEED it to survive.
But oh, another virtual world entirely thought up by a machine? Throw it on the pile. We're going to get bored of it, and it will quickly become not incredible.
> we NEED it to survive.
plenty of people in the world still drink crappy water, and they survive.
You don't _need_ it, you want it, because it's much more comfortable.
But when something becomes a "need" as you described it, you think of it differently. Just like how you don't _need_ electricity to survive, but it's so ingrained that you now think of it as a need.
> We're going to get bored of it, and it will quickly become not incredible.
exactly, but i have already said this in my original post - your standards just moved up.
Are the ai worlds the dirt or the water here?
The trademark/copyright issues of making that both a reality and an income stream are as yet unsolved.
do you find it interesting talking to NPCs in games?
If I could talk to something at the level of Neuro-sama (https://www.twitch.tv/vedal987) I'd be very entertained and it's essentially a matter of time. Hell, I'd love to have something like this as an assistant application as well and I'm not a Cortana/Google Assistant/etc user.
Talking to NPCs in games is really just reading dialog written by humans.
If you could actually talk to NPCs as in get their thoughts about the world and ask open ended questions, that’d be very interesting.
> a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books
Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?
Read this satirical speech by Claude, in French https://x.com/pmarca/status/1881869448275177764) and in English (https://x.com/pmarca/status/1881869651329913047) and tell me: can you write fiction more entertaining or imaginative than that? Is there someone in your vicinity who can?
Perhaps that's mundane, so is there someone in your vicinity who can reason about a topic in mathematics/physics as well as this: https://x.com/hsu_steve/status/1881696226669916408 ?
Probably your answer is "yes, obviously!" to all the above.
My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.
> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop
No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.
Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.
> Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that?
So while I generally agree with you, I think this was a bad example to use: a lot of these are slop, with the kind of AI sheen we've come to glaze over. I'd say less than 20% are actually artistically impressive / engaging / thought-provoking.
This is a better AI gallery (I sorted all images on the site by top from this year).
https://civitai.com/images
There's still plenty of slop in there, and it would be a better gallery of if there was a way to filter out anime girls. But it's definitely higher than 20% interesting to me.
The closest similar community of human made art is this:
https://www.deviantart.com/
Although unfortunately they've decided to allow AI art there too so it makes comparison harder. Also, I couldn't figure out how to get the equivalent list (top/year). But I'd say I find around the same amount interesting. Most human made art is slop too.
I think you fundamentally misunderstand what people use "slop" to describe.
> Most human made art is slop too.
I'm assuming you're using the term "slop" to describe low-quality, unpolished works, or works where the artist has been too ambitious with their skill level.
Let me put it this way:
Every piece of art that is made, is a series of decisions. The artist uses their lived experience, their tastes and their values to create something that's meaningful to them. Art doesn't need to have a high-level of technical expertise to be meaningful to others. It's fundamentally about communication from artists to their audience. To this point, I don't believe there's such a thing as "bad art" (all works have something to say about the artist!).
In contrast, when you prompt an image generator, you're handing over the majority of the decisions to the algorithm. You can put in your subject matter, poses, even add styles, but how much is really being communicated here? Undoubtedly it would require a high level of technical skill to render similarly by hand, but that's missing the forest for the trees- what is the image saying? There's a reason why most "good" AI-generated images generally have a lot of human curation and editing.
Here's an example of some "slop" from the AI Art Turing Test (https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-ar...) from a while back: https://i.imgur.com/RAMFKP1.jpeg There's definitely a high level of technical expertise that a human would require to paint something like this. But it's very clearly AI-generated. Can you figure out why?
---
As a side note, here's a human-made piece that I appreciate a lot. https://i.imgur.com/AZiiZj1.jpeg The longer you explore it, the more the story unfolds, it's quite lovely. On the other hand, when I focus on the details in AI-generated works, there's not much else to see.
> I think you fundamentally misunderstand what people use "slop" to describe.
I don't think I do, actually. It's not a term with a technical definition, but in simple terms it means art that is obviously AI, because it has the sheen, weird hands, inconsistencies, weird framing or thematic elements that are hard to describe without an art degree but which we instinctively know is wrong, or is just plain bad.
I used the term slop to describe bad humans art too, but I meant something subtly different. It's a term that has been used to describe bad work of all kinds from humans since long before there was AI.
In this case, it's art from humans who are learning what makes good art. You say there's no bad art, and it's a valid viewpoint, but I'd say bad art is when the artist has a clear goal in their mind, but they lack the skills to realize it. Nonetheless, they share it for feedback and approval anyway, and by doing that on a site like DeviantArt they learn and grow as artists. But meanwhile, to me or anyone else who is visiting that site to find "good", meaningful art made by skilled artists, this is slop. Human slop, not AI slop.
> here's a human-made piece that I appreciate a lot
I like your art. I'm glad you made it. What I like most is that it's fun to look at and think about which is what you say you intended. I hope I get to see more of your art.
> To this point, I don't believe there's such a thing as "bad art" (all works have something to say about the artist!).
As a classically trained oil painter, I know for sure there is bad art especially because I've made more than enough bad art for one lifetime.
Bad art begins with a lack of craftsmanship and is exemplified by a poor use of materials/media and forms, or a lack of knowledge of those forms (e.g. poor anatomical knowledge, misunderstanding the laws of perspective), or an overly literal representation of forms (a photograph is better at being literal, for example).
> Here's an example of some "slop" from the AI Art Turing Test […] But it's very clearly AI-generated. Can you figure out why?
It's only "clearly AI-generated" because we know that AI is capable of generating art. If you saw this without that context you wouldn't immediately say "AI!" Instead, you'd give it a normal critique that you'd give a student or colleague: I'd say:
- there's too much repetition of large forms.
- there's an unpleasant hierarchy of values and not enough separation of values.
- The portrait of the human is the focus of the image yet it has been lost in the other forms.
- The composition can improve with more breathing room in the foreground or background which are too busy.
- Here look at this Frazetta!
However, my rudimentary list could just as easily be turned into prompts to be used to refine the image and experiment with variations. And, perhaps you'd consider that to be a human making decisions?
> I like your art. I'm glad you made it. What I like most is that it's fun to look at and think about which is what you say you intended. I hope I get to see more of your art.
Just to be clear, it's not my art.
> There's still plenty of slop in there, and it would be a better gallery […]
Thanks for sharing your better AI gallery. It's awesome to see.
Your reply clarifies my point even better: I shared a gallery, you evaluated it and shared an even better gallery! Undoubtedly someone else will look at yours today or next year, and say, as you said, "You missed a slop! Here's a better gallery".
My point fundamentally is about basic capability of the average and even above average person. As a classically trained amateur painter, I frequently ask myself: "Can I paint a nude figure better than what you've called slop?" As I mathematician I ask: "Can I reason better than this model?"
> fixation on that is confounding your reasoning.
it is a fixation based on the desire that they themselves shouldn't be rendered economically useless in the future. Then the reasoning come about post-facto from that desire, rather than from any base principle of logic.
Most, if not all, that are somewhat against the advent of AI are like the above in some way or another.
Wow, AI can write an argument that's bad on purpose! That totally proves the AI is a master writer.
Now show me the AI write something that's actually good on purpose.
> Now show me the AI write something that's actually good on purpose
The average human can't even write a 3000 word short story that is good "on purpose" even if they tried.
I know because I've participated in many writing workshops.
The real question is: can you?
> AI can write an argument that's bad on purpose
Are you able to recognise good writing? How do I know? For all I know you're the most incompetent reader and writer on the planet. But your skills are irrelevant.
What's relevant is that deep learning is more skilled than the average person. If you're not aware of this you're either a luddite or confused about the state of the art.
The 'strawmanning your opponent' technique is a non-argument, and is effortless to pull off. Surrounding your argument with tons of purple prose (which Claude is good at) does not change that.
Writing a good argument requires 3 things: be logical, be compelling and likeable, and have a solid reputation. It does not require purple prose.
As for good writing, I'm pretty sure Brandon Sanderson's Mistborn trilogy qualifies, which was written with a rather small vocabulary and pedestrian prose, yet is universally praised.
Tbf, I do think Claude Sonnet and SD are impressive, and I think they can aid humans in producing compelling content, but they are not on the level of amateur fiction writers.
Besides, surpassing most humans in an area where most humans are unskilled is not a feat, not even AI companies flex on that.
> Writing a good argument requires 3 things: be logical, be compelling and likeable, and have a solid reputation. It does not require purple prose.
That's a common misconception that young writers have. Their prose is first purple and overwrought, then they overcorrect and try to be Hemmingway, then they master the craft and discover that form follows function.
As such, the "purpleness" of prose is not an indictment of any sort except if the style doesn't serve the substance. So yes, purple prose is sometimes required and can be used correctly, just ask James Joyce or Hitchens or remember that first sentence in Lolita, for example.
Furthermore, almost every piece of writing you've probably enjoyed went through an editor or several professional editors. You'd be shocked to read early or even late drafts.
(Also, a having "solid reputation" has f' all to do with whether you can construct a good argument. Wanting that as a prerequisite is what the cool kids used to call "appeal to authority". Anyway ...)
But wtf are we even talking about now?
> Besides, surpassing most humans in an area where most humans are unskilled is not a feat, not even AI companies flex on that.
I don't care what "AI companies flex". What I care about, as a programmer, and as an artist, and as a writer who won a tiny prize in my even tinier and insignificant niche on the planet, is what tools we can build for the average person and what tools I have access to.
If I have a robot that is 50% stronger than me or 10x better read than the average human or 20% better than the average mathematician, that's a huge victory. So yes, surpassing the average human is a feat.
But it's not merely the average human who has been surpassed: the average mathematician (skilled in mathematics) and the average artists (skilled in art) and the average writer, have all been surpassed. That is my testable claim. Play with the tools, and see for yourself.
> can you paint better and more imaginatively than that?
the fact that you are seriously asking this question says a lot about your taste.
> the fact that you are seriously asking this question says a lot about your taste.
Non sequitur. My sense taste or lack of it, is irrelevant.
Questions about "taste" don't matter when the average person doesn't have the craft to produce what they claim they are competent to judge especially when we're talking about such low hanging fruit as: "write a short story", "write an essay", "analyse this math problem", "draw an anatomically accurate portrait or nude figure", "paint this still life", "sketch this landscape".
Are you able to make the distinction between taste and craftsmanship?
Then after you are done signalling whatever it is you think you're signalling by vaguely gesturing at your undoubtedly superior sense of taste, perhaps we can talk like adults about what I asked?
Frankly i think you cannot get past your own delusion about AI and no argument will change your mind. No one can make you appreciate art properly and I can only hope one day you will.
> No one can make you appreciate art properly and I can only hope one day you will.
Lmao.
Refer to my other comment for more context, for whatever that is worth (talking with strangers who are eager to judge everyone but themselves is always weird but unavoidable online): https://news.ycombinator.com/item?id=42790853
I think it has its place. For 'background filler' I think it makes a lot of sense; stuff which you don't need to care about, but whose absence can make something feel less real.
To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.
Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.
AR/VR doesn't has a 3D model issue.
It has a 'why would I strap on a headset for stuff I can do without'
I will not starting meeting friends just because of the meta verse. I have everything I need already.
And even video calls with Whatsapp is alweird as f.
Jeez I'd love to know what Apple's R&D debt on Vision Pro is, based on current sales to date. I really really hope they continue to push for a headset that's within reach of average people but the hole must be so deep at this point I wouldn't be surprised if they cut their losses.
As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software. Until the "visicalc" must have killer app shows up to move the hardware, there is little incentive for general users to make the investment.
> As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software.
The third option is peoples' expectation for AR/VR itself: it could be a highly niche and expensive industry and unlikely to grow to the general population.
AR needs a bragging app.. something like the dharma/content you create in virt growing out of your footsteps in real - and why visible on cellphone, feeling more native in with AR-googles
Maybe eventually. Based on this quality I don't see this happening any time in the near future.
Wow they really need to work on that first splash image[1]. All the assets there look hideous.
[1] https://github.com/Tencent/Hunyuan3D-2/blob/main/assets/imag...
the assets look good as a starting point, they may even be used as background elements
Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-fileI assume it's safe to ignore as model weights aren't copyrightable, probably.
you dont know what kind of backdoors are hidden in the model weights
Can you elaborate on how any sort of backdoor could be hidden in the model weights?
It's a technical possibility to hide something in the code, but that would be a bit silly since there's not that much of it here. It's not technically possible to hide a backdoor in a set of numbers that are solely used as the operands to trivial mathematical operations, so I'm very curious about what sort of hidden backdoor you think is here.
When you run their demo locally, there are two places that trigger a warning that the code loads the weights unsafely. To learn more about this issue, search "pytorch model load safety issues" on Google.
I'm sure there's warnings about possibly loading code alongside the weights since they distribute the weights as pickled data, but:
1. It's trivial to go look at what's being loaded.
2. Any code that's in the distributed pickled data is not weights. The GP suggests that there are backdoors hidden in the weights which is nonsensical unless the code runs eval on the weights or something similar, which would make anyone looking at the code immediately realise it was doing something dodgy.
As an example of (1), here's all the GLOBALs in the pickled code:
None of these could be used for anything malicious as far as I know.I'm trying to think of what kind of adversarial 3D model the weights could produce. Perhaps a 3D goatse?
I mean... you can just firewall it?
you dont know which prompt activates the backdoor, how can you firewall it if you run the model in production?
3d asset generation is a use case that for most doesn’t need to run in production
"In production" in this case is a stand-in for "in any environment with access to sensitive stuff" which might just include GPUs, if what the attacker wanted was crypto processing grunt. Besides, if you're providing 3D asset generation as a service (which I can imagine most deployments of this sort of thing will be, at least for now) then it absolutely is running in production. The purpose of that production environment is entirely to run asset generation.
Simply sanatieze the model outputs, which is the only thing that would escape running it in complete isolation.
Your concern is reasonable.
According to DOD, Tencent - which published this model - is a Chinese military company
https://www.bbc.com/news/articles/c9q78wn9g8zo
Is this tied to EU regulations around AI models?
Interesting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.
To me, the bird mesh actually does look like marching cubes output. Note the abundance of almost square triangle pairs on the front and sides. Also note that marching cubes doesn't nescessarily create stairstep-like artifacts; it can generate a smooth looking mesh given signed distance field input by slightly adjusting the locations of vertices based on the relative magnitude of the field at the surrounding lattice points.
If they are using MC, does that mean they are actually generating SDFs? If so it would be nice if you could output the SDF rather than the triangle mesh.
The meshes generated by the huggingface demo definitely look like the product of marching cubes.
For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)
I tried using Hunyuan3D-2 on a 4090 GPU. The Windows install encountered build errors, but it worked better on WSL Ubuntu. I first tried it with CUDA 11.3 but got a build error. Switching to CUDA 12.4 worked better. I ran it with their demo image but it reported that the mesh was too big. I removed the mesh size check and it ran fine on the 4090. It is a bit slow on my i9 14k with 128G of memory.
(I previously tried the stability 3d models: https://stability.ai/stable-3d and this seems similar in quality and speed)
Cool, thanks. I'm kinda interested so hearing it at least runs on a 4090 means I might give it a go one weekend.
The hunyuan3d-dit-v2-0 model is 4.93 GB. ComfyUI is on their roadmap, might be best to wait for that, although it doesn't look complicated to use in their example code.
https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan...
Has the word "advanced", gotta be good
Any user-generated content system suffers from what we call “the penis problem”.
As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.
They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2
I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)
[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...
Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.
Ops ran out of edit time when I was posting my last two
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.
Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.
Yeah, this is absolutely light years off being useful in production.
People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.
I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.
Stable Diffusion and AI in general seems to be big in marketing at least. A friend decided to abandon engineering and move to marketing and the entire social media part of his job is making a rough post, converting it to corporate marketing language via AI and then generating an eye catching piece of AI art to slap on top.
When video generation gets easy he'll probably move to making short eye catching gifs.
When 3D models and AI in general improve I can imagine him for example generating shitty little games to put in banners. I've been using an adblocker for so long I don't know what exists nowadays but I remember there being banners with "shoot 5 ducks" type games where the last duck kill opens the advertisers website. Sounds feasible for an AI to implement reliably. If you can generate different games like that based on the interests of the person seeing the ad you can probably milk some clicks.
> been around for how long, and what serious professional game developers are using it as a core part of their workflow?
Are you in the game industry? If you’re not how would you even know they have not? As someone with some connections in the industry and may soon get more involved personally, I know at least one mobile gaming studio with quite a bit of funding and momentum that has started using a good deal of AI-generated assets that would have been handcrafted in the past.
Yeah the big problem I have with my field is that there seems to be stronger incentives to be chasing benchmarks and making things look good than there is to actually solve the hard problems. There is a strong preference for "lazy evaluation" which is too dependent on assuming high levels of ethical presentation and due diligence. I find it so problematic because this focus actually makes it hard for people to publish who are tackling these problems. Because it makes the space even noisier (already incredibly noisy by the very nature of the subject) and then it becomes hard to talk about details if they're presumed solved.
I get that we gloss over details, but if there's anywhere you're allowed to be nuanced and be arguing over details should it not be in academia?
(fwiw, I'm also very supportive of having low bars to publication. If it's void of serious error and plagiarism, it is publishable imo. No one can predict what is important or impactful, so we shouldn't even play that game. Trying to decide if it is "novel" or "good enough for <Venue>" is just idiotic and breeds collusion rings and bad actors)
The first guitar has one of the strings end at the sound hole, and six tuning knobs for five strings.
The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.
Are these considered good capability examples?
I take back a fair amount of what I said.
It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.
Thanks for this. The results are quite impressive, after trying it myself.
imagine something like this but geared towards 3d printing functional objects.