Stability AI introduces Stable Video Diffusion, pioneering video generation via AI. Despite its potential, limited access and ethical concerns loom over its use. The models, SVD and SVD-XT, exhibit promising quality but face constraints in content creation. The company aims for commercialization amid financial struggles and leadership departures, raising doubts about its stability and long-term success.
Stability AI enters the video generation arena with Stable Video Diffusion, showcasing AI’s capabilities in animating images to create videos. Amid OpenAI’s turmoil, this innovation garners attention, offering open-source models for video creation. However, limitations and ethical concerns in usage arise, signaling potential misuse. Yet, the models exhibit quality outputs, raising optimism for applications like 360-degree object views. Financial challenges and key personnel departures, including Ed Newton-Rex, underline the company’s turbulent journey.
One such notable development is Stability AI’s announcement of Stable Video Diffusion, an AI model designed to generate videos by animating existing images. Leveraging the foundation of Stability’s existing Stable Diffusion text-to-image model, Stable Video Diffusion stands out as one of the few available open-source models for video generation, catering to both commercial and open-source avenues.
However, accessibility to Stable Video Diffusion remains limited at present. Described by Stability as a “research preview,” access to the model requires users to adhere to specific terms of use. These terms outline the intended applications of Stable Video Diffusion, such as in educational or creative tools, design, and artistic processes, while also highlighting prohibited uses, including creating factual representations of people or events.
Historically, similar AI research previews, including those by Stability, have often found their way onto the dark web, raising concerns about potential misuse. The absence of a built-in content filter in Stable Video Diffusion heightens worries about potential abuse, similar to incidents observed when Stable Diffusion was utilized to create nonconsensual deepfake content.
Stable Video Diffusion encompasses two models: SVD and SVD-XT. SVD converts still images into 576×1024 videos comprising 14 frames, while SVD-XT, following the same architecture, increases the frame count to 24. Both models can generate videos ranging from three to 30 frames per second.
The models underwent initial training on a dataset comprising millions of videos, followed by further refinement on a smaller set of hundreds of thousands to around a million clips. The origin of these videos remains unclear, potentially posing legal and ethical challenges related to usage rights, especially if copyrighted material was part of the training data.
Despite these concerns, both SVD and SVD-XT produce high-quality four-second clips, as evidenced by selected samples showcased on Stability’s blog. The output quality rivals recent models from industry giants like Meta, Google, and AI startups such as Runway and Pika Labs.
Acknowledging its limitations, Stability transparently highlights the models’ constraints, including the inability to generate videos without motion, control by text, render legible text, or consistently produce accurate facial representations.
However, Stability remains optimistic about the models’ adaptability and potential for expansion. The company foresees potential applications, such as generating 360-degree views of objects, and hints at plans for “text-to-video” tools to prompt the models on the web, aiming ultimately for commercialization.
Nevertheless, Stability AI faces challenges on multiple fronts. Reports indicate financial struggles, including delayed wage payments and threats from service providers like AWS to revoke access due to unpaid payroll taxes. Despite raising $25 million through a convertible note, bringing its total funding to over $125 million, the company is seeking higher valuations, aiming to quadruple its worth, amidst persistently low revenues and a high burn rate.
The departure of Ed Newton-Rex, VP of audio, dealt another blow to Stability AI. Newton-Rex’s exit, driven by disagreements over the use of copyrighted data in AI model training, underscores the ongoing challenges and ethical considerations within the organization.
Overall, Stability AI’s foray into video generation marks an exciting milestone, showcasing the potential and limitations of AI-driven content creation. While Stable Video Diffusion displays prowess in generating high-quality clips, concerns regarding accessibility and ethical usage persist. The company’s ambitions for commercialization clash with financial hardships, delays in wage payments, and threats from service providers. Ed Newton-Rex’s departure over ethical disagreements accentuates internal challenges. The future success of Stability AI hinges on navigating these hurdles, ensuring ethical data usage, and fostering financial stability amid its pursuit of AI innovation.