Thumbnail for AI Video Just Got WAY TOO REAL... (VEO 3)

AI Video Just Got WAY TOO REAL... (VEO 3)

Channel: Wes RothPublished: May 21st, 2025AI Score: 100
28.1K1.4K35721:25

AI Generated Summary

Airdroplet AI v0.2

Here's a summary of the video about Google's Veo 3 AI model:

Google's new Veo 3 AI video model is seriously impressive, especially because it can generate not just video, but also integrated audio like music, sound effects, and even speech directly from text prompts. The video showcases a bunch of different, often wild, prompts to see how well Veo 3 performs, highlighting its strengths in motion, detail, and audio while also pointing out some inconsistencies and glitches.

Here are the key topics and technical details discussed:

  • Veo 3's Big Deal: Integrated Audio: The standout feature of Veo 3 is its ability to generate video with audio elements like music, voices, and sound effects based on the prompt. This is a major step up and makes the videos feel much more complete and dynamic compared to video models that only generate visuals.
  • Testing Methodology: All available AI credits were used to generate multiple versions (usually four) for a variety of prompts. The results shown are not cherry-picked; they represent the range of outputs from these tests, giving a realistic view of the model's performance.
  • Inflatable Duck Chase: Testing a "dirty off-road buggy racing through mud getting chased by a large, scary looking blow up duck." The results were phenomenal, especially the motion and menace of the duck. Version four was considered the best, even showing the duck gaining on and knocking the buggy off the road, which was surprisingly effective.
  • T-Rex Reflection: Trying to generate "two women slowly raise a mirror so that you can see your own reflection. You are a menacing T-Rex with massive teeth." The model handled reflections well, creating realistic-looking scenes, although there was some variation in quality across the different versions. Version one was personally felt to be the best overall.
  • The Hacking Octopus & Wet Keyboard: A longer, multi-part prompt about an octopus hacking a computer, hiding when someone enters, and the person asking "Why is my keyboard all wet?". The model did a great job with the multi-step narrative and capturing human expressions reacting to the wet keyboard. However, there were visual glitches like headless octopuses in some versions. A surprising and weird detail was that one generated scene looked uncannily like the presenter's actual keyboard setup.
  • Gorilla vs. 10 Men: Testing a "gorilla fighting 10 men" to see how the AI handles chaotic battle scenes. The results were pretty good, capturing the action and intensity. Version three was potentially the best, despite a slightly silly sound effect at the end.
  • First-Person Forest Run: A prompt for a "first person view of an animal running through a night forest with superhuman speed, eventually emerging to see a human village and people fleeing in terror." Most versions didn't quite capture the requested first-person view or the animal running correctly, but one version did it perfectly and was considered "really good" and by far the closest to the prompt's intent.
  • Eagle Playing Accordion: An absurd prompt asking for an "eagle... playing the accordion." The model generated different interpretations, some with human-like hands or extra limbs, which was weird. The audio felt appropriate, and one version specifically seemed to capture the "struggle" an eagle would have with the instrument.
  • Undead Guitar Solo: Asking for an "undead from Dungeons and Dragons is playing a guitar solo on top of a mountain of skulls. A field of skeleton fans are going wild down below. The moon is bright and red." This prompt highlighted the AI's ability to generate music on the fly to fit the description. The visuals were detailed, capturing the undead look and the scene effectively, even adding some ad-libbing sounds in one version.
  • Yarn Sumo Trash Talk: Generating "two sumos made out of yarn... doing a playful trash talking" with specific lines provided. Despite a typo in the prompt ("Yarm"), the AI understood "yarn" and generated characters that looked like they were made of yarn. The speech fidelity and lifelike gesturing in some versions were impressive, though one version had disturbing visuals and another wasn't clear who was speaking. Version one was considered the best overall for this prompt.
  • Wolf Chasing Rabbit: A "first person view of a wolf chasing down a rabbit, jumping over falling trees and branches... View low to the ground." Similar to the forest run, some versions weren't strictly first-person but still captured the speed and feeling of the chase effectively. Version three was particularly liked for capturing the intensity.
  • Walking Brick House: Prompting for a "brick house with people leaning out of windows. It has six mechanical legs and is walking down the street as people stare in awe." Version one was the most realistic, showing people leaning out and looking like a walking building. Other versions looked a bit "off," highlighting the inconsistency in rendering complex, unnatural concepts.
  • Fat Cat on Throne: Asking for an "obnoxiously fat cat sits upon a large golden throne. It looks at you as you approach and says, I see you brought me snacks. I guess I will let you live for meow." The model successfully generated the scene and synthesized the requested speech in three out of four versions, even adding a cat pun. One version captured the attitude but failed to deliver the specific lines. Version one was felt to be the best.
  • Spaceship Approaching Ring World: Attempting a notoriously difficult prompt: "a view from the cabin of a spaceship as it approaches a massive ring world... Signs of a civilization can be seen on the inner part of the ring world." As expected, the model struggled with the 'ring world' concept, which AI models typically find hard. While none were perfect ring worlds (some looked like Saturn's rings), version three was the closest and considered among the best renditions seen of this specific challenging prompt.
  • Revisiting Veo 2 Prompts: Testing prompts previously showcased by Google for Veo 2, like the "first person chasing ice skater" and the "helmet mounted POV tailing a woman on a dirt bike." Veo 3 handled these well, with excellent motion and capturing the requested points of view, also adding great sound effects like the ice skates or dirt bike noises.
  • Roller Coaster POV: A "first person view of a slowly rising roller coaster before it drops rapidly into the night below." The model captured the scene beautifully, including the stars, but consistently failed to include the requested "drop" portion, cutting off right before the climax.
  • Snow Tiger: Generating a "tiger made out of snow, walking in a snowy forest." Some versions captured the 'made of snow' look perfectly and had fantastic sound effects like crunching snow (rated A+), while others looked more like regular tigers in snow or had less fitting sounds. The variation in results was apparent here.
  • Overall Impression: The presenter was very impressed with Veo 3, particularly the quality and integration of the audio (sounds, music, speech, intonations). He felt he ran out of credits too quickly just as he was learning how to prompt it effectively, suggesting that mastering prompting is still key. He believes the model is "very, very good" and possibly represents the "next generation" of AI video models.
  • Actionable Takeaway: There's a clear need to get more credits and continue testing to better understand how to prompt Veo 3 to get the best results, indicating that effective prompting is a skill that needs development even with advanced models.