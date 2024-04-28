Connect with us

China’s Tech Firms Debut First Text-to-Video AI Model ‘Vidu’ Challenging Sora AI

Arsi Mughal

Published

10 seconds ago

on

China's Debut First Text-to-Video AI 'Vidu' Model Challenging Sora AI

(CTN News) – ShengShu-AI, a leading Chinese tech firm, alongside Tsinghua University, introduced Vidu, a cutting-edge text-to-video artificial intelligence (AI) model, during the Zhongguancun Forum in Beijing.

This groundbreaking innovation, touted as China’s answer to the renowned Sora, marks yet another milestone in China’s rapid advancements in critical AI domains.

Video Generation with U-ViT

Vidu boasts the capability to swiftly generate 16-second 1080P video clips with just a single click, powered by its proprietary visual transformation model architecture named Universal Vision Transformer (U-ViT).

This architecture seamlessly integrates two state-of-the-art text-to-video AI models, namely the Diffusion and the Transformer, as confirmed by the developers.

The unveiling of U-ViT occurred approximately two months subsequent to the global debut of Sora, developed by the US-based OpenAI, garnering significant attention worldwide.

ezgif 2 8462da9bb4 1

Vidu

Zhu Jun, vice dean of the Institute for Artificial Intelligence at Tsinghua University and chief scientist of ShengShu-AI, emphasized the alignment of Vidu with their technical roadmap, catalyzing their relentless pursuit of research advancement.

Innovative Origins of U-ViT Technology

Notably, the core technology behind U-ViT was initially proposed by U-ViT’s research team in September 2022, predating Sora’s model architecture known as DiT – Diversity in Transformation.

DiT stands as the world’s premier visual transformation model architecture, amalgamating the strengths of the Diffusion and the Transformer, as per media reports.

During a live demonstration at the forum, U-ViT showcased its remarkable capacity to simulate real-world environments with intricate detail, adhering to physical laws such as realistic lighting, shadow effects, and nuanced facial expressions.

Moreover, AI exhibits proficiency in generating dynamic shots, departing from conventional static imagery.

An additional advantage of U-ViT lies in its Chinese origins, endowing it with a profound understanding of Chinese cultural elements.

Media reports suggest that Vidu excels in generating images featuring iconic Chinese symbols such as pandas and dragons, showcasing its cultural sensitivity and adaptability.

Vidu Key Takeaways:

  1. A New Benchmark in AI: Vidu, a collaborative effort between ShengShu-AI and Tsinghua University, marks a significant advancement in AI-generated video production, effortlessly generating 16-second videos at 1080p resolution.
  2. Competitive Prowess: Surpassing the capabilities of OpenAI’s Sora, Vidu positions China as a formidable contender in the global AI landscape.
  3. Cultural Enrichment: Vidu distinguishes itself by seamlessly integrating Chinese cultural elements into its outputs, catering to local user preferences and enhancing cultural relevance.
  4. Technological Breakthrough: Leveraging Diffusion and Transformer models within its U-ViT architecture, Vidu sets a new standard for realism and dynamism in AI-driven video content creation, pushing the boundaries of technological possibility.
Arsi Mughal

Arsi Mughal is a staff writer at CTN News, delivering insightful and engaging content on a wide range of topics. With a knack for clear and concise writing, he crafts articles that resonate with readers. Arsi's pieces are well-researched, informative, and presented in a straightforward manner, making complex subjects accessible to a broad audience. His writing style strikes the perfect balance between professionalism and casual approachability, ensuring an enjoyable reading experience.

