锤爆Sora,尺度最大,谷歌发布最强视频模型Veo2,叫板海螺可灵

之前,OpenAI 一直狙击谷歌的新产品。现在,这个回旋镖打回来了。

谷歌昨晚在 OpenAI 发布会之前,发布了两个重量级更新,一个是最先进的视频生成模型 Veo 2,一个是文生图模型 Imagen-3。

图片

先给你看个视频:

打网球、挥拍、投篮、跑步,这不是运动会宣传片,而是谷歌这次发的Veo 2生成的视频!它扛住了大幅度运动的考验,我愿称之为年度尺度最大、效果最佳的视频模型!

拿 Sora 跟 Veo 2 对比一下,看完这个,瞬间感觉刚开的 OpenAI 订阅不香了……

Sora 20 美元版:5 秒视频时长、分辨率最高 720p

Sora 200 美元版:20 秒视频时长、分辨率最高 1080p

Veo 2:约 2 分钟视频时长、分辨率最高 4k

除了生成视频的参数,benchmark 表现也超越 Sora,一同比较的模型有 4 个,除了 Sora, 还有 Meta Movie Gen 和 Kling v1.5、Minimax。

图片
左侧是整体满意度,右侧是提示词遵循度

谷歌这次发布的 Veo 2 的几项增强:

  1. 保真度:显著改进细节、真实感和减少伪影。
  2. 准确度:更理解物理世界,懂得遵循详细指令,能高度准确地表示运动。
  3. 相机控制:了解电影摄影的独特语言,能创建各种拍摄风格、角度、动作。
图片

说到指令遵循,我前几天做的 Sora 实测,Sora 指令遵循能力差到难以置信。推特上一网友提供了一组切西红柿的 Veo 2 和 Sora 的对比视频。

A pair of hands skillfully slicing a ripe tomato on a wooden cutting board.(一双手熟练地在木菜板上切成熟的西红柿)

Veo 2 演示效果:

Sora 效果:

家人们,就看看 Veo 2 的阴影、反射、力学和视觉效果,未免太真了吧!现在能这么自然地处理对象交互的模型,谁还能拉出第二个来看看?

你们可以看一下,官方的演示视频:

prompt:Cinematic shot of a female doctor in a dark yellow hazmat suit, illuminated by the harsh fluorescent light of a laboratory. The camera slowly zooms in on her face, panning gently to emphasize the worry and anxiety etched across her brow. She is hunched over a lab table, peering intently into a microscope, her gloved hands carefully adjusting the focus. The muted color palette of the scene, dominated by the sickly yellow of the suit and the sterile steel of the lab, underscores the gravity of the situation and the weight of the unknown she is facing. The shallow depth of field focuses on the fear in her eyes, reflecting the immense pressure and responsibility she bears.

(电影镜头中一位身穿深黄色防护服的女医生,在实验室刺眼的荧光灯照射下。镜头慢慢拉近她的脸,轻轻摇动以强调她额头上刻着的担忧和焦虑。她弯腰伏在实验台上,聚精会神地看着显微镜,戴着手套的双手小心地调整焦点。场景的色调柔和,以病态的黄色套装和实验室的无菌钢铁为主,强调了局势的严重性和她所面临的未知的重量。浅景深聚焦于她眼中的恐惧,反映出她所承受的巨大压力和责任。)

你们就说,能看出来这是 AI 生成的视频吗?

如果说特写镜头体现不出实力,那来看看这个:

Prompt: The camera floats gently through rows of pastel-painted wooden beehives, buzzing honeybees gliding in and out of frame. The motion settles on the refined farmer standing at the center, his pristine white beekeeping suit gleaming in the golden afternoon light. He lifts a jar of honey, tilting it slightly to catch the light. Behind him, tall sunflowers sway rhythmically in the breeze, their petals glowing in the warm sunlight. The camera tilts upward to reveal a retro farmhouse with mint-green shutters, its walls dappled with shadows from swaying trees. Shot with a 35mm lens on Kodak Portra 400 film, the golden light creates rich textures on the farmer’s gloves, marmalade jar, and weathered wood of the beehives.

(相机轻轻地漂浮在一排排粉彩画的木制蜂箱中,嗡嗡作响的蜜蜂在画面中滑进滑出。动作落在站在中间的优雅农民身上,他质朴的白色养蜂服在金色的午后阳光下闪闪发光。他举起一罐蜂蜜,稍微倾斜以捕捉光线。在他身后,高大的向日葵在微风中有节奏地摇曳,花瓣在温暖的阳光下闪闪发光。镜头向上倾斜,露出一座带有薄荷绿色百叶窗的复古农舍,墙壁上布满了摇曳的树木的阴影。使用 35 毫米镜头在柯达 Portra 400 胶片上拍摄,金色的光线在农民的手套、果酱罐和风化的蜂箱木材上创造出丰富的纹理。)

家人们,这一段视频,都能拿去做纪录片以假乱真了吧!

除了人像,模拟真实物理世界也不在话下。

Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. Coffee pours in smooth, swirling motion into a crystal-clear cup, filling it with deep brown layers of crema. Scene ends with a camera swoop into a fresh-cut orange, revealing its bright, juicy segments in stunning macro detail.

(太阳在完美的早餐场景后面慢慢升起。厚厚的金色枫糖浆缓慢地倒在一堆松软的煎饼上,每个煎饼都释放出柔软温暖的蒸汽云。脆皮培根的特写镜头发出嘶嘶声,金色油脂的微小余烬飞散到空气中。咖啡以平稳、旋转的方式倒入水晶般透明的杯子中,充满深棕色的咖啡油脂层。场景结束时,摄像机猛扑到刚切好的橙子中,以令人惊叹的宏观细节展现出其明亮、多汁的部分。)

Prompt: A cinematic, high-action tracking shot follows an incredibly cute dachshund wearing swimming goggles as it leaps into a crystal-clear pool. The camera plunges underwater with the dog, capturing the joyful moment of submersion and the ensuing flurry of paddling with adorable little paws. Sunlight filters through the water, illuminating the dachshund's sleek, wet fur and highlighting the determined expression on its face. The shot is filled with the vibrant blues and greens of the pool water, creating a dynamic and visually stunning sequence that captures the pure joy and energy of the swimming dachshund.

(电影般的高动作跟踪镜头拍摄了一只戴着泳镜的极其可爱的腊肠犬跳进水晶般清澈的水池的画面。相机与狗一起潜入水下,捕捉到浸入水中的快乐时刻以及随后用可爱的小爪子划水的瞬间。阳光透过水面,照亮了腊肠犬光滑湿润的皮毛,凸显了它脸上坚定的表情。这张照片充满了池水充满活力的蓝色和绿色,创造了一个充满活力和视觉震撼的序列,捕捉到了游泳的腊肠犬纯粹的快乐和能量。)

A cinematic shot captures a fluffy Cockapoo, perched atop a vibrant pink flamingo float, in a sun-drenched Los Angeles swimming pool. The crystal-clear water sparkles under the bright California sun, reflecting the playful scene. The Cockapoo's fur, a soft blend of white and apricot, is highlighted by the golden sunlight, its floppy ears gently swaying in the breeze. Its happy expression and wagging tail convey pure joy and summer bliss. The vibrant pink flamingo adds a whimsical touch, creating a picture-perfect image of carefree fun in the LA sunshine.

(在阳光普照的洛杉矶游泳池中,一个电影镜头捕捉到了一只毛茸茸的可卡犬栖息在充满活力的粉红色火烈鸟浮标上。清澈见底的海水在加州灿烂的阳光下闪闪发光,倒映着嬉戏的景象。可卡犬的皮毛是白色和杏色的柔软混合色,在金色的阳光下显得更加突出,它松软的耳朵在微风中轻轻摇曳。它快乐的表情和摇动的尾巴传达着纯粹的欢乐和夏日的幸福。充满活力的粉红色火烈鸟增添了一种异想天开的感觉,在洛杉矶的阳光下营造出一幅无忧无虑的完美画面)

Prompt: A low-angle shot captures a flock of pink flamingos gracefully wading in a lush, tranquil lagoon. The vibrant pink of their plumage contrasts beautifully with the verdant green of the surrounding vegetation and the crystal-clear turquoise water. Sunlight glints off the water's surface, creating shimmering reflections that dance on the flamingos' feathers. The birds' elegant, curved necks are submerged as they walk through the shallow water, their movements creating gentle ripples that spread across the lagoon. The composition emphasizes the serenity and natural beauty of the scene, highlighting the delicate balance of the ecosystem and the inherent grace of these magnificent birds. The soft, diffused light of early morning bathes the entire scene in a warm, ethereal glow.

(低角度拍摄捕捉到一群粉色火烈鸟在郁郁葱葱、宁静的泻湖中优雅地涉水。它们鲜亮的粉红色羽毛与周围翠绿的植被和晶莹剔透的碧绿海水形成了美丽的对比。阳光在水面上闪闪发光,在火烈鸟的羽毛上产生闪烁的倒影。当它们穿过浅水时,它们优雅而弯曲的脖子被淹没,它们的动作产生轻柔的涟漪,蔓延到整个泻湖。构图强调场景的宁静和自然之美,突出生态系统的微妙平衡和这些宏伟鸟类与生俱来的优雅。清晨柔和、漫射的光线使整个场景沐浴在温暖、空灵的光芒中。)

除了真实的场景,哪怕是“梦核”的内容,Veo 2 也能稳定驾驭:

Prompt: The camera spirals down through an infinite network of glowing threads, pulsating with multicolored light. The setting feels alive, each thread thrumming with faint whispers and bursts of imagery—fractals, mythological beasts, and celestial maps. The courier darts through the maze, their silhouette painted with the kaleidoscopic glow of the fibers. As they weave between strands, their every touch triggers animations—one a glowing phoenix, another a blooming lotus—until they stumble upon a massive, golden thread. It flares, and a holographic figure emerges: a younger version of themselves, surrounded by fiery glyphs. The scene shifts between soft, glowing pastels and brilliant, fiery tones, blending hand-drawn 2D animation with dynamic light effects, captured in fluid, sweeping motion.

(摄像机螺旋向下穿过一个无限的发光线网络,闪烁着五彩的光芒。场景给人一种生机勃勃的感觉,每条线索都充满了微弱的低语和图像的爆发——分形、神话野兽和天体图。信使飞快地穿过迷宫,他们的轮廓被纤维的万花筒般的光芒所描绘。当它们在线之间编织时,它们的每一次触摸都会触发动画——一个是发光的凤凰,另一个是盛开的莲花——直到他们偶然发现一根巨大的金线。它闪耀,一个全息人物出现:他们自己的年轻版本,周围环绕着炽热的符号。场景在柔和、发光的粉彩和绚丽、火热的色调之间变换,将手绘的 2D 动画与动态灯光效果融为一体,以流畅、扫过的动作捕捉。)

看完这些 showcase,笔者就第一时间冲到官网,打算实验一下 Veo 2 是不是真的这么牛——

结果,目前仅支持通过 VideoFX 平台使用这个模型,并且想使用只能申请加入 waitlist!

图片

申请地址:https://labs.google/fx/zh/tools/video-fx

介于 Veo 2 模型如此震撼的效果,之前实测 Sora 时,大家在评论区打出的“想看夕小瑶跳青海摇”,我打算直接用 Veo 2 满足广大小伙伴的愿望。

现在笔者已经申请 waitlist 了,只要申请通过,就给大家带来一手实测!(才不是因为我自己也想看跳青海摇是什么效果——)

图片

放个family群里的神评论hhh看完上述官方 demo 的小伙伴可能还有一个疑问:

诶它不是最长能生成 2 分钟左右的视频吗,怎么每个官方演示视频都只有 8 秒?

这是因为目前 VideoFX 平台上只能使用 Veo 2 阉割版,限制视频分辨率上限为 720p,长度为 8 秒。

(即便是这样,也比 20 美元套餐的 Sora 强啊)

图片

DeepMind 产品副总裁 Eli Collins 表示,在接下来的几个月中,他们将继续根据用户的反馈进行迭代,慢慢放出 Veo 2 完全体,并将 Veo 2 集成到整个 Google 生态系统中。预计明年会分享更多更新。

好吧,原来是一个期货。

但谷歌的交付速度我是比较信任的,肯定不会像 Sora 一样等了一年才放出来。

与 Veo 2 一起发布的,还有一个文生图模型 Imagen 3,生图细节更好、光照更丰富、干扰更少。benchmark 得分就直接看图吧。

图片

这个模型生图细节确实好到爆,现在就能直接使用,想体验的小伙伴可以直接去下面的地址试一下。

Imagen 3 使用地址:https://labs.google/fx/tools/image-fx

图片

谷歌现在放出的大招,跟 OpenAI 12 天直播的更新内容形成了鲜明对比——

谷歌每次出现就是核弹级别,Gemini 2.0,还有这次的 Veo 2,抢尽了 OpenAI 的风头。

OpenAI 这边就是纯粹的炒作过度了,预告 12 天直播给大家勾起了兴趣,现在又一直给大家泼冷水。

不知道 Sam Altman 看到谷歌这次的更新后,还能不能睡得着觉。