The model should perform better with JSON prompts than with natural language prompts. The safety filter may also be triggered incorrectly by natural language prompts, we are aware of this and will make a future checkpoint update to improve it.
What are the best alternatives for template + text rendering? I need Canva-level templating ability with the system matching font dimensions, positions, and the image without background.
I ended up creating an algo using color variance and auto-sizing, an LLM to select the fonts and text and nano banana. The thing I need is nano-banana-level images with Canva+human-level abilities automated.
It would be awesome to have a free LLM that can do that. Running on a 16GB RAM Mac. I have a lot of images and templates I can train on too. Not sure if I overengineered it, would love to have just one LLM that can do everything for free.
Nice to see another locally hostable model! It’s going to take me a bit longer to add this model to the GenAI Showdown benchmark [1], since I’ll need to add a bit of customization so it produces highly optimized JSON-structured prompts.
It might be worth noting that fal.ai [2] (a fairly popular router in the generative AI space) doesn’t really mention or emphasize the JSON-structured prompt format, and seems to suggest it works just as well with natural language. It might be worth reaching out to them, at least to clarify this point and make things a bit clearer.
This is a impressively good model - even more so by virtue that it is locally hostable and is comparable to the original NB released in August of last year. Ideogram doesn’t have the fanfare of Black Forest Labs or some of the other big names, but they consistently produce very solid work.
It took me longer than anticipated to produce a solid pass through the benchmark for a number of reasons. The strong emphasis on proper JSON formatting as opposed to the more common natural language makes a huge difference in output quality.
Another issue is that the default workflow provided by ComfyUI tends to produce heavily “overcooked” images because of the high CFG settings. With some workflow adjustments, though, you can get very strong results.
Of all the recently released locally hostable models, this one definitely demands the most manual effort. But if you’re willing to put in the work, it can produce fantastic results (scoring an 8 out of 15 on the benchmark and the only local model that managed the Bee comic)
Non-commercial license, you should not call that "open-weight". Words have meaning.
And people are having a laugh at how censored the model is.
https://old.reddit.com/r/StableDiffusion/comments/1tvtu2u/id...
https://old.reddit.com/r/StableDiffusion/comments/1tvxhzv/id...
The model should perform better with JSON prompts than with natural language prompts. The safety filter may also be triggered incorrectly by natural language prompts, we are aware of this and will make a future checkpoint update to improve it.
See how to prompt the model here: https://github.com/ideogram-oss/ideogram4/blob/main/docs/pro...
Disclaimer: I work at Ideogram.
I made a detailed playground notebook to run on Nvidia GPU locally and a detailed article on setup.
https://snehal.ai/ideogram4-local-playground/
What are the best alternatives for template + text rendering? I need Canva-level templating ability with the system matching font dimensions, positions, and the image without background.
I ended up creating an algo using color variance and auto-sizing, an LLM to select the fonts and text and nano banana. The thing I need is nano-banana-level images with Canva+human-level abilities automated.
It would be awesome to have a free LLM that can do that. Running on a 16GB RAM Mac. I have a lot of images and templates I can train on too. Not sure if I overengineered it, would love to have just one LLM that can do everything for free.
Nice to see another locally hostable model! It’s going to take me a bit longer to add this model to the GenAI Showdown benchmark [1], since I’ll need to add a bit of customization so it produces highly optimized JSON-structured prompts.
It might be worth noting that fal.ai [2] (a fairly popular router in the generative AI space) doesn’t really mention or emphasize the JSON-structured prompt format, and seems to suggest it works just as well with natural language. It might be worth reaching out to them, at least to clarify this point and make things a bit clearer.
[1] - https://genai-showdown.specr.net
[2] - https://fal.ai/ideogram-4
[1] Nice, let us know if you need a hand
You can call our API to generate structured json from text using this endpoint (this endpoint is free of charge, but you need a free account to create an api key): https://developer.ideogram.ai/api-reference/api-reference/ma...
[2] Fal runs the text_prompt (natural language) through the magic prompt system, so indeed results should be good there
Sorry for the belated reply!
This is a impressively good model - even more so by virtue that it is locally hostable and is comparable to the original NB released in August of last year. Ideogram doesn’t have the fanfare of Black Forest Labs or some of the other big names, but they consistently produce very solid work.
It took me longer than anticipated to produce a solid pass through the benchmark for a number of reasons. The strong emphasis on proper JSON formatting as opposed to the more common natural language makes a huge difference in output quality.
Another issue is that the default workflow provided by ComfyUI tends to produce heavily “overcooked” images because of the high CFG settings. With some workflow adjustments, though, you can get very strong results.
Of all the recently released locally hostable models, this one definitely demands the most manual effort. But if you’re willing to put in the work, it can produce fantastic results (scoring an 8 out of 15 on the benchmark and the only local model that managed the Bee comic)
https://genai-showdown.specr.net/?models=fd,hd,kd,qi,f2d,zt,...
Will it work on Apple silicon machines? Maybe in the Draw Things application? Or is it all command line
The galery of generated images looks amazing, it's hard (but often possible) to spot inconsistencies in detailed images.
Thank you. I will test this on a Mac mini and come back here
Exciting!