
ChatGPT-4o Images are UNREAL...
AI Generated Summary
Airdroplet AI v0.2The new native image functionality in ChatGPT-4o is enabling incredibly impressive creations that feel almost unreal. People are using it to transform existing media like movie trailers into wildly different artistic styles, generate highly realistic photos and detailed infographics, and even create marketing materials and comic book pages with surprising accuracy and quality. This capability is sparking a massive trend, particularly with the Studio Ghibli art style, and is seen as a major turning point for image and video generation, offering powerful tools to artists and non-artists alike.
Here are the key things discussed about ChatGPT-4o's image capabilities:
- It's capable of generating images in vastly different artistic styles, with the Studio Ghibli aesthetic currently being very popular.
- One amazing example is a recreation of the Lord of the Rings trailer in Studio Ghibli style, which was created by screenshotting each frame of the original trailer (102 shots), converting each screenshot to the desired style using OpenAI's native image generation (accessible via ChatGPT and Sora), and then animating the resulting images using video generation tools like Kling AI and Luma Labs before re-editing them together. The creator spent $250 in credits and 9 hours on this project, showing that while powerful, it can still require effort and cost.
- The creator of the LoTR Ghibli trailer also described providing a text description of the scene alongside the original screenshot when generating the Ghibli version, which helped ensure the composition and elements were maintained in the new style.
- It's noted that you don't have to use existing IP like Studio Ghibli or Lord of the Rings; the model can generate images in entirely made-up styles or based on simple descriptions.
- Another impressive example is a Dune trailer also rendered in Studio Ghibli style, which took only 28 minutes from idea to export, demonstrating the speed at which these creations can be made using tools like Sora and Kling AI.
- Beyond stylistic transformations, the model is also excellent at generating highly realistic photos, such as images of Albert Einstein lifting weights or a sloth working at the DMV, which look surprisingly lifelike.
- It's particularly good at creating detailed and crisp infographics, like guides on "How to Live in New York" or "How to Take Your Cat on a Leash Walk," with minimal textual errors.
- You can seamlessly combine different generated elements or styles, like placing a cartoonish infographic into a hyper-realistic photo of someone holding it.
- The model can generate detailed maps in various styles, from vintage world maps to illustrated travel maps, though some minor text errors (like misspelled ocean names or labels) were observed.
- Anatomy diagrams, like a Wikipedia page for cats or sloth anatomy, can be generated with impressive detail and accuracy for a zero-shot generation, although minor errors like incorrect spellings or elements pointing to the wrong parts can still occur.
- It can create marketing materials and ads in specific styles, like a cartoon ad for a Ridge wallet or a Mad Men-style print ad for a product, demonstrating its versatility for business and creative purposes.
- The model can perform tasks like extracting specific assets from existing images, such as pulling the shark out of a Mr. Beast thumbnail.
- It can also merge faces onto existing images or memes almost flawlessly, blending the new face seamlessly into the original picture's style and theme.
- Creating designs in specific graphical styles, like the skeuomorphic design language of the original iOS with realistic shading and textures, is also possible and looks "flawless" in examples shown.
- Generating comic book panels or illustrated stories, such as explaining complex scientific concepts like T-cells for children, is another powerful capability, producing beautiful artwork although minor text errors might still appear in speech bubbles or captions.
- Converting images from one medium to another, like transforming a Rick and Morty cartoon scene into a "realistic photograph," is possible, though the results can sometimes have a slightly "uncanny valley" feel.
- Celebrities, including Mike Tyson and even Sam Altman, are using the model to create Studio Ghibli versions of famous photos or themselves, highlighting its widespread adoption and the trend's reach.
- Sam Altman noted that the images generated by ChatGPT have been way more popular than expected, even with high initial expectations, which may delay the rollout of image generation to the free tier. He also mentioned that the initial examples OpenAI shows for new tech are carefully chosen to set a positive tone, citing the Ghibli examples as a success in avoiding "awful deep fake nonsense" as the first viral use case.
- There's debate about the impact on graphic designers; while some feel it's "over" for them, the speaker believes it provides graphic designers with more tools and allows anyone to create visuals without needing expert knowledge in traditional software like Photoshop or Illustrator.
- The speaker feels that the rise of AI tools like this means that in the future, the most important skills will be having good "data" (understanding the information needed) and "taste" (knowing what looks good and what to ask for), rather than technical software proficiency – it's the "age of vibe anything."
- A current limitation observed is the difficulty in controlling output dimensions precisely, such as trying to generate an image wide enough to fit a Twitter header perfectly.
- The underlying model is an "omni model" that understands and can express itself in text, image, and voice simultaneously without conversion layers, meaning it integrates logic from text models with image generation capabilities.
Overall, the new image capabilities in ChatGPT-4o are incredibly powerful, versatile, and surprisingly accessible, driving creative trends and suggesting significant shifts in how visual content is created.