Insights

I built marshmallow castles in Google new AI world generator

Published Jan 30, 2026

Updated May 1, 2026

I built marshmallow castles in Google new AI world generator

Google DeepMind Opens Access to Project Genie

Google DeepMind is granting access to Project Genie, an AI tool that creates interactive game worlds from text prompts or images. Starting Thursday, Google AI Ultra subscribers in the U.S. can experiment with this research prototype. It combines Google's latest world model Genie 3, its image generation model Nano Banana Pro, and Gemini.

This release, arriving five months after Genie 3's research preview, is part of a broader effort to collect user feedback and training data. DeepMind is racing to develop more capable world models. These AI systems generate an internal representation of an environment and can predict future outcomes and plan actions. Many AI leaders view world models as a crucial step toward artificial general intelligence (AGI). In the nearer term, DeepMind envisions a market strategy starting with video games and entertainment before expanding to train embodied agents, or robots, in simulation.

The world model competition is intensifying. Fei-Fei Li's World Labs released its first commercial product, Marble, late last year. AI video generation startup Runway also recently launched a world model. Former Meta chief scientist Yann LeCun's startup, AMI Labs, will also focus on developing world models.

How Project Genie Works

Shlomi Fruchter, a DeepMind research director, expressed excitement about broader access and feedback. DeepMind researchers are upfront about the tool's experimental nature. Its performance is inconsistent, sometimes impressively generating playable worlds and other times producing baffling results.

You begin with a "world sketch" by providing text prompts for both the environment and a main character, whom you can later maneuver through the world in first or third-person view. Nano Banana Pro creates an image based on the prompts, which you can theoretically modify before Genie uses it as a starting point for an interactive world. Modifications mostly worked, but the model occasionally failed, for example, giving purple hair when green was requested.

You can also use real-life photos as a baseline for the model to build a world, though this was hit or miss. Once satisfied with the image, Project Genie takes a few seconds to create an explorable world. You can remix existing worlds by building on their prompts or explore curated worlds in the gallery or via a randomizer tool for inspiration. You can then download videos of the world you explored.

Currently, DeepMind only allows 60 seconds of world generation and navigation due to budget and compute constraints. Genie 3 is an auto-regressive model requiring significant dedicated compute, which limits how much DeepMind can provide users. Fruchter explained the 60-second limit is to bring it to more users, as a chip is dedicated to each session. Extending beyond 60 seconds would diminish the incremental testing value, given the current limitations in interaction and environmental dynamism.

Strengths, Limitations, and User Experience

Safety guardrails are active. You cannot generate anything resembling nudity or worlds remotely associated with Disney or other copyrighted material. Google received a cease-and-desist from Disney last year over AI models generating unauthorized content based on Disney's characters and IP. Attempts to generate worlds like mermaids exploring underwater fantasy lands or ice queens in wintery castles were also blocked.

Despite limitations, the demo was impressive. One test involved creating a childhood fantasy: a castle in the clouds made of marshmallows with a chocolate sauce river and candy trees, rendered in claymation style. The model delivered a whimsical world with puffy, tasty-looking spires and turrets.

However, Project Genie still has kinks to work out. The models excel at creating worlds based on artistic prompts, like watercolors, anime style, or classic cartoon aesthetics. But they tend to fail with photorealistic or cinematic worlds, often producing results that look like video games rather than real settings.

Using real photos as input also yielded mixed results. Providing a photo of an office to create an exact world resulted in a digital-looking space with similar furnishings but laid out differently, appearing sterile. When feeding a photo of a desk with a stuffed toy, Project Genie animated the toy navigating the space, with other objects occasionally reacting as it moved past.

DeepMind is working on improving interactivity. Characters sometimes walked right through walls or other solid objects. The model's auto-regressive architecture allows it to remember what it generated. Testing this by returning to previously generated parts of the environment showed it mostly succeeded, though in one case, a second mug appeared when revisiting a desk scene.

Navigation controls, using arrow keys to look around, the spacebar to jump or ascend, and W-A-S-D keys to move, proved frustrating for non-gamers. Keys were often non-responsive or sent you in the wrong direction, making simple movement a chaotic zigzagging exercise.

Fruchter acknowledged these shortcomings, reiterating that Project Genie is an experimental prototype. The team aims to enhance realism and improve interaction capabilities, including giving users more control over actions and environments. He stated that while they don't see it as an end-to-end product for daily use, it offers a glimpse of something interesting, unique, and unachievable by other means.

Found this helpful? Share it.