London-based AI lab Odyssey has launched a research preview of a new model that transforms video into interactive environments, hinting at a potential new storytelling medium.
Initially built for film and game production, the model allows users to influence video scenes in real time using a keyboard, phone, controller – or eventually voice commands. Odyssey calls it an “early version of the Holodeck.”
The model creates realistic video frames every 40 milliseconds, enabling fast, lifelike responses to user input. While the visuals aren’t yet on par with high-end video games, the experience is described as “a glitchy dream – raw, unstable, but undeniably new.”
Unlike traditional CGI or game engines, Odyssey’s system is powered by a “world model,” which generates each frame based on past actions and inputs – similar to how language models predict text. The result is more organic and less scripted than standard gaming logic.
To improve stability and reduce the risk of visual drift over time, Odyssey pre-trains the model on general video data, then fine-tunes it on narrower environments. This sacrifices variety for smoother performance.
Currently, each user session costs £0.80 to £1.60 per hour, powered by clusters of H100 GPUs in the US and EU – expensive compared to video streaming but far cheaper than traditional film or game production.
Odyssey believes this AI-powered interactive video could evolve into a new storytelling platform, with potential uses in entertainment, training, education, and virtual travel. While the current release is a proof of concept, it marks a significant step toward more immersive, user-driven digital experiences.