Stability AI has released a significant update today for its text-to-image generative AI technology, introducing Stable Diffusion 3.5.

The primary aim of this update is to enhance the existing capabilities of Stability AI’s generative models, particularly following feedback that its previous major release fell short of expectations. Initially unveiled in February, Stable Diffusion 3 saw its open model version launched in June with the introduction of Stable Diffusion 3 Medium. Once a frontrunner in the field, Stability AI now competes with several strong contenders, including Black Forest Labs’ Flux Pro, OpenAI’s DALL-E, Ideogram, and Midjourney.

With Stable Diffusion 3.5, Stability AI aims to regain its competitive edge. This update features highly customizable new models that can produce a diverse array of styles. Three model variants are available: Stable Diffusion 3.5 Large, an 8 billion parameter model renowned for its quality and prompt adherence; Stable Diffusion 3.5 Large Turbo, a distilled version that enables faster image generation; and Stable Diffusion 3.5 Medium, optimized with 2.6 billion parameters for edge computing applications.

All three new models are available under the Stability AI Community License, which permits free non-commercial use and commercial use for organizations with less than $1 million in annual revenue. For larger enterprises, Stability AI offers an enterprise licensing option. These models can be accessed through Stability AI’s API and Hugging Face.

The initial launch of Stable Diffusion 3 Medium in June was met with challenges, providing essential insights that have shaped the current updates for Stable Diffusion 3.5.

“We recognized that several model and dataset selections made for the Stable Diffusion Large 8B model were not suitable for the smaller Medium model,” stated Hanno Basse, CTO of Stability AI, in an interview with VentureBeat. “We conducted thorough analyses of these limitations and innovated our architecture and training methods for the Medium model to strike a better balance between model size and output quality.”

How Stability AI is Improving Text-to-Image Generative AI with Stable Diffusion 3.5

In developing Stable Diffusion 3.5, Stability AI has implemented various innovative techniques to enhance both quality and performance.

A significant advancement in this update is the incorporation of Query-Key Normalization within the transformer blocks. This method simplifies the fine-tuning process and allows for further model enhancements by users, contributing to increased stability during training.

“While we have utilized QK-normalization in earlier experimentation, this is our inaugural model release featuring this normalization,” Basse explained. “It was a logical choice to prioritize customization for this new model.”

Moreover, Stability AI has refined its Multimodal Diffusion Transformer MMDiT-X architecture specifically for the Medium model. Initially highlighted in April when the Stable Diffusion 3 API was launched, MMDiT merges diffusion and transformer models to improve image quality and support multi-resolution generation.

Prompt Adherence Makes Stable Diffusion 3.5 Even More Powerful

Stability AI reports that Stable Diffusion 3.5 Large exhibits enhanced prompt adherence compared to competing models, which improves its ability to accurately interpret and transform user prompts into visuals.

“This improvement is attributed to numerous factors – enhanced dataset curation, more effective captioning, and innovative training methodologies,” Basse highlighted.

Customization Will Get Even Better with ControlNets

Looking ahead, Stability AI plans to introduce a ControlNets feature for Stable Diffusion 3.5.

ControlNets are expected to provide users with greater control for various professional applications. The technology was first rolled out with the SDXL 1.0 version in July 2023.

“ControlNets offer spatial control for diverse applications, enabling users to upscale images while preserving overall color schemes or create images that adhere to specific depth patterns,” Basse explained.

Source
venturebeat.com

Stability AI Unveils Stable Diffusion 3.5 to Enhance Open Image Generation Models

How Stability AI is Improving Text-to-Image Generative AI with Stable Diffusion 3.5

Prompt Adherence Makes Stable Diffusion 3.5 Even More Powerful

Customization Will Get Even Better with ControlNets

AI Revolutionizes Coding at Microsoft, Google, and Meta

UiPath’s New Orchestrator Directs AI Agents to Adhere to Your Enterprise’s Guidelines

Expert Advice from a Lawn Care CEO on Building Strong Customer Relationships

Impact of Hurricane Helene Continues to Affect Popular North Carolina Destinations

Tecno Camon 40 Premier: Battery Life and Charging Test Results Revealed

EcoFlow Wave 3 Review: The Superior Portable Air Conditioner and Heater

Breaking news

Impact of Hurricane Helene Continues to Affect Popular North Carolina Destinations

Audience at Trump Town Hall Bursts Into Laughter Over One Highly Unbelievable Claim

West Kelowna Mayor Issues Apology Over Letter Detailing Water Treatment Plant Debt Costs – Okanagan

Kenyan MP Fatally Shot in Targeted Attack in Nairobi

Palak Tiwari Teams Up with Thakur Anoop Singh for Action Thriller ‘Romeo S3’

In-Depth Interview: DHS Secretary Kristi Noem Discusses Child Deportations and Other Key Issues

Hill Staffers Take on the Aging Democratic Establishment in Congress