AI
AI

Goodbye Photoshop? Google’s New AI Allows Image Editing Through Simple Requests

Photo credit: arstechnica.com

Multimodal Output Unlocks Exciting New Opportunities

The advent of genuine multimodal output in chatbots presents intriguing new possibilities. For instance, the Gemini 2.0 Flash model is capable of engaging users in interactive graphical games or crafting narratives accompanied by consistent illustrations, ensuring continuity in both characters and settings across multiple images. While the technology is still evolving, character consistency marks a significant advancement in AI-driven programs. In testing, it demonstrated remarkable capabilities, including generating alternate views of a provided image.

Creating a multi-image narrative with Gemini 2.0 Flash, part 1.

Google / Benj Edwards

Creating a multi-image narrative with Gemini 2.0 Flash, part 2. Observe the alternate perspective of the initial image.

Google / Benj Edwards

Creating a multi-image narrative with Gemini 2.0 Flash, part 3.

Google / Benj Edwards

Text generation is another area where the Gemini model aims to excel. According to Google, internal assessments indicate that Gemini 2.0 Flash surpasses other leading models in creating images that incorporate text, hinting at its potential for generating content that blends text and visuals seamlessly. While the results are mostly readable, they did not yield particularly exciting outcomes in initial testing.

An example of text rendering within images produced by Gemini 2.0 Flash.

Credit: Google / Ars Technica

Despite the limitations evident in Gemini 2.0 Flash, the arrival of authentic multimodal image generation signals a pivotal moment in the AI landscape, particularly as the technology has room to evolve further. Envisioning a decade ahead, one might imagine AI systems capable of real-time generation of diverse media forms—ranging from text to images, audio, video, 3D graphics, and even tangible objects. This concept closely resembles a holodeck, minus the capability for matter replication.

However, it’s important to contextualize this optimism within the framework of current limitations. Google acknowledges that we are still in the early stages of multimodal output development. The Gemini 2.0 Flash model was designed to be a more compact AI solution, optimized for speed and cost-effectiveness, which limits its access to the vast knowledge available online. Instead of drawing from a comprehensive dataset representing the entire Internet, Google trained Gemini 2.0 Flash using a carefully selected dataset, likely supplemented with synthetic data tailored for specific outcomes. Consequently, the model does not encompass the full spectrum of visual knowledge, which is reflected in its performance. Google itself notes that the training data is extensive but not exhaustive or definitive.

This reality underscores that while the quality of generated images may not yet meet expectations, there is considerable potential for improvement as methods for training evolve and computational resources become more affordable. Should the progress mirror what has been observed with diffusion-based image generators like Stable Diffusion, Midjourney, and Flux, the quality of multimodal image output could see rapid enhancements in a relatively short time. We might be on the verge of a transformative era in media production.

Source
arstechnica.com

Related by category

Mysterious Rumors Suggest an iPhone Desktop Mode is in the Works

Photo credit: www.theverge.com Recently, a well-known source in the tech...

Raspberry Pi Reduces Product Returns by 50% with Improved Pin Soldering Techniques

Photo credit: arstechnica.com Raspberry Pi's Approach to Soldering: A Balance...

YouTube Trials Blurred Thumbnails for Mature Content

Photo credit: www.theverge.com YouTube has introduced a new experimental feature...

Latest news

How to View Star Wars: Tales of the Underworld in Fortnite

Photo credit: dotesports.com Fortnite is gearing up to offer an...

Ajith Kumar’s Wife Breaks Her Silence with First Post Following Actor’s Hospitalization Reports

Photo credit: www.news18.com Last Updated: April 30, 2025, 21:47 IST Tamil...

10 Iconic ’90s Movies That Split Critics and Audiences

Photo credit: movieweb.com Film critics play a vital role in...

Breaking news