Stability AI has released Stable Diffusion XL (SDXL), an upgraded text-to-image AI system that significantly improves image generation quality. The system, currently in beta, is said to be transformative across several industries, including graphic design and architecture. It is also capable of generating text and has functionality beyond text-to-image prompting, including image-to-image prompting, inpainting, and outpainting. The previous version of Stable Diffusion struggled to recreate some anatomical features, but SDXL has no such issues, though some hands generated are not always realistic.
Stability AI, a startup that funds various generative AI experiments, has unveiled a new version of its Stable Diffusion system, called Stable Diffusion XL (SDXL). Available in beta through Stability AI’s generative art tool, DreamStudio, SDXL significantly improves the image generation quality of its predecessor, Stable Diffusion 2.1. According to Tom Mason, Stability AI’s CTO, the new model brings “richness” to image generation, with improvements most notable in applications like graphic design and architecture. The company claims SDXL is transformative across several industries.
The new SDXL system appears to be on par with and perhaps even better than the latest release of Mid Journey’s model, responsible for creating memes such as “Balenciaga Pope.” While the previous version of Stable Diffusion and many other text-to-image systems, struggled to recreate certain anatomy, like hands, SDXL has no such trouble. Although the hands generated are not always realistic, they are miles ahead of the poor-quality images produced by the earlier Stable Diffusion system.
Stability AI also claims that SDXL features “enhanced image composition and face generation” and does not require long, detailed prompts to create “descriptive imagery,” unlike its predecessor. Moreover, SDXL has functionality that extends beyond just text-to-image prompting, including image-to-image prompting (inputting one image to get variations of that image), inpainting (reconstructing missing parts of an image), and out painting (constructing a seamless extension of an existing image).
While Stability AI marches forward with generative AI art technology, the company has faced legal challenges due to the way it has built and commercialized its tools. A legal case alleges that the company infringed on the rights of millions of artists by developing its tools using web-scraped, copyrighted images. Getty Images has also taken Stability AI to court for reportedly using images from its site without permission to create the original Stable Diffusion.
The open-source release of Stable Diffusion has also become the subject of controversy due to its relatively light usage restrictions. Some communities around the web have tapped it to generate pornographic celebrity deep fakes and graphic depictions of violence. To date, at least one U.S. lawmaker has called for regulation to address the release of models like Stable Diffusion that “don’t sufficiently moderate content.”
In response to the lawsuits, Stability AI recently pledged to respect artists’ requests to remove their art from Stable Diffusion’s training dataset, but that did not apply to SDXL—only to the next-generation Stable Diffusion models, code-named “Stable Diffusion 3.0”. Artists have removed more than 78 million works of art from the training dataset to date, according to Spawning, the organization leading the opt-out effort.
Legal challenges aside, Stability AI is under pressure to monetize its sprawling AI efforts, which run the gamut from art and animation to Biomed and generative audio. Although Stability AI CEO Emad Mostaque has hinted at IPO plans, Semafor recently reported that the company, which raised over $100 million in venture capital last October at a reported valuation of more than $1 billion, “is burning through cash and has been slow to generate revenue.”
Leave a Reply