OpenAI Launches GPT-4o's New Image Generation Into ChatGPT, Showing 'Unbelievably Better' Results

7 months ago 16

OpenAI is integrating image generation capabilities directly into ChatGPT, allowing users to create images without leaving the chat interface.

The company announced the feature Tuesday as part of its broader push to make AI tools more useful and accessible across different media, staying relevant in the AI art scene.

The feature is an evolution of DALL·E 3, OpenAI’s image generator, which launched in September 2023 but fell out of favor among AI enthusiasts who preferred the next generation of models, including Flux, MidJourney v6, SD 3.5, Recraft, and Reve.

Before this release, OpenAI offered two different models on the same platform, with GPT generating text and DALL·E 3 handling image generation.

Now, GPT-4o will do everything on its own, and DALL·E 3 will disappear.

“GPT‑4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration,” OpenAI claimed in an official blog post.

The integration of DALL·E 3 continues to make good on the company’s plan to make GPT-4o an “omni” model, trained with multimodal data and capable of handling all tasks. The result is a model that is much more capable, accurate and intelligent than its predecessors.

"We know we've made you wait, but we think it's really worth it, and we think you're going to love it," Sam Altman, OpenAI's CEO, said in a video showing GPT-4o’s new capabilities. "It's such a huge step forward that the best way to explain it to you is just to show it."

In the video, the company showed off the system's capabilities with several examples, including manga pages explaining the theory of relativity—with inputs in english and mandarin—custom trading cards based on personal and real photos, commemorative coins combining multiple images with transparent backgrounds, and a very accurate image based on and extraordinarily long and detailed prompt.

The model is slow at generating images, but it seems to be highly accurate. Altman pointed to the significant quality upgrade as worth the longer waiting time.

"Images are much slower than our previous image generation (model), but unbelievably better. We think it's super worth the wait," Altman said during the demonstration. "We also will be able to make it faster over time."

The rollout appears to be happening gradually, and we weren’t able to get our hands on the new model as of press time.

Users can tell which system they're using based on how images appear: Besides the apparent quality gap, DALL·E 3 images pop up fully formed after a loading screen, while the new GPT-4o renders images progressively from top to bottom in real time.

The company emphasized that the technology extends beyond creating fancy images.

"What's really exciting about this release is that now these models can actually visualize what they know and externalize it in a visual way," explained a research scientist at OpenAI, invited by Sam Altman to talk about this new feature.

This capability allows for educational applications like detailed scientific diagrams or informational posters with accurately rendered text and even image editing with subject consistency.

OpenAI has also implemented guardrails to prevent the generation of deepfakes, illegal content, and the removal of watermarks.

While the generated images won't have visible watermarks, they will include C2PA metadata to identify them as AI-created. The company is also developing tools to track image provenance.

The company plans to bring the feature to its API, allowing developers to integrate the technology into their own applications. OpenAI’s Terms of Use also say that users will retain ownership of images they generate, subject to the company's usage policies.

Edited by Sebastian Sinclair and Josh Quittner