Most AI still “sees” the way a poster sees, as a flat picture, but not Christoph Lassner of World Labs. That works for captions and filters, but it breaks down when you need space. If you want an AI to place a chair in your living room, guide a robot down a hallway, or build a game level you can walk through, it has to understand depth, layout, and how objects relate.
That’s where Christoph Lassner comes in, and why his new startup, World Labs, has gotten so much attention since 2024. In simple terms, this is 3D technology that rebuilds rooms, people, and scenes so you can move around them, edit them, and use them in real products.
This post covers Lassner’s path, what World Labs says it’s building, and why it matters for games, film, AR, VR, and robotics.
Who is Christoph Lassner, and what has he built so far?
Christoph Lassner is a computer scientist known for work that sits between computer vision (teaching computers to “see”) and computer graphics (teaching computers to “draw”). His specialty is getting from messy 2D inputs, like photos and phone videos, to usable 3D that can render fast enough for interactive tools.
As of early 2026, he is a co-founder of World Labs, alongside Fei-Fei Li, Justin Johnson, and Ben Mildenhall. Before that, he worked across startups and large tech groups on 3D bodies, 3D scenes, and neural rendering. If you’ve ever wondered how a phone scan can approximate a human body, or how a headset demo can show a lifelike scene from a short video, you’re in the territory he’s spent years improving.
He’s also known for releasing practical research tools. One example is the Pulsar renderer (a differentiable renderer that became a sphere-based backend for PyTorch3D). In plain language, that kind of tool helps models learn 3D by letting training “see” how changes in 3D would change the final image.
From Germany to computer vision research, the early path that shaped his 3D focus
Lassner’s early training happened in Germany, with formal study at the University of Augsburg. His diploma thesis work looked at human pose estimation, which means predicting how a person’s body is positioned from an image.
That might sound like a narrow topic, but it’s a direct bridge to 3D. A 2D photo only shows outlines and colors. Pose estimation tries to recover the underlying structure, like where the shoulders and hips sit in space, and how limbs bend. It’s similar to looking at a shadow and guessing the shape that made it.
He later earned a PhD at the University of Tübingen. His research connected with groups such as the Bernstein Center for Computational Neuroscience and the Max Planck Institute for Intelligent Systems. During that period, he worked on problems like estimating 3D pose and shape from single images, which is one of the hardest constraints in vision. You don’t get multiple viewpoints, so the model has to infer depth and body shape from cues like perspective and anatomy.
A key idea from this era is simple: if a system can place a human body correctly in 3D from one image, it has learned a lot about geometry, scale, and occlusion (what’s hidden behind something else).
Key stops before World Labs: Bodylabs, Amazon, Meta Reality Labs, and Epic Games.
Lassner’s career includes several stops that each added a piece to the 3D puzzle.
At Bodylabs (later acquired by Amazon), the focus was on 3D body modeling. The goal was to infer a realistic body shape and pose, not just a stick figure skeleton. That matters for avatars, apparel fit, animation, and any pipeline that needs consistent human geometry.
At Amazon, he worked on the pose system for Amazon Halo, which created 3D body models from smartphone videos. The key point here is accessibility. Instead of a studio rig full of cameras, the input could be something many people already have: a phone.
At Meta Reality Labs, he led work related to 3D reconstruction and neural rendering, including radiance field-style approaches shown publicly around Meta Connect 2022. Neural rendering means the system learns how a scene should look from different viewpoints, rather than relying only on hand-built meshes and textures.
At Epic Games, his role has tied machine learning, vision, and graphics together, with an eye toward real-time results. That “real-time” constraint is what separates a neat demo from something a creator can actually use in a game engine.
World Labs explained in plain English: building AI that understands the 3D world.
World Labs is a startup founded in early 2024 by Fei-Fei Li, Justin Johnson, Christoph Lassner, and Ben Mildenhall. Their stated mission centers on spatial intelligence and something they call Large World Models (LWMs).
If large language models predict words, and image models predict pixels, LWMs try to predict and represent spaces. That includes shape, depth, and how a room stays consistent as you move around. The company’s public materials point to systems that can take text, an image, ora video and produce an explorable 3D environment you can navigate.
They have also shown a product called Marble, released as a beta in late 2025. Marble is positioned as a way to create interactive 3D worlds from simple inputs, then edit and expand those worlds. Updates discussed publicly describe improvements like bigger worlds, more visual styles, and better geometry.
World Labs has attracted major investment as well. Public reporting around late 2025 included investment from Cisco, and Autodesk also announced a large investment tied to 3D design and physical AI tooling.
A useful way to think about LWMs is this: instead of generating a single image, the model tries to generate a place that stays coherent when you move.
What “spatial intelligence” means, and why 2D-only AI hits a wall
Spatial intelligence is the ability to reason about size, distance, and layout. People do this without thinking. You know the couch is two steps from the coffee table. You can reach for a light switch in the dark because you remember where it sits.
A 2D-only AI can describe what’s in a photo, but it may not understand where things are in a stable, navigable way. It can also struggle with occlusion. If a chair blocks part of a desk, a flat model might guess wrong about the desk’s shape because it can’t “hold” a 3D mental map.
That limitation matters because interaction depends on geometry. A robot needs clearance to turn a corner. An AR app needs to place an object on a real surface at the right scale. A game tool needs a room that doesn’t fall apart when the camera moves.
Spatial intelligence, as World Labs uses the term, pushes toward models that treat the world as something you can explore, not just something you can label.
The big idea: generating and editing 3D rooms and scenes from photos or video
World Labs’ public positioning around Marble highlights a clear input and output loop:
- Input: a single image, a panorama, a video clip, or even a text prompt.
- Output: a 3D environment that you can look around in, move through, and edit.
The simplest example is a single photo becoming a space you can “step into.” You might start with one view of a living room. The system estimates depth, fills in hidden areas, and builds geometry that stays consistent as you change the camera angle. In some descriptions, it uses fast 3D representations such as Gaussian splats to keep motion smooth.
This approach has obvious creative uses. It can also support simulation, where you need many varied environments for testing robots or training AI agents. Still, current materials also imply limits, with the strongest results often shown in room-like scenes and sets.
How modern 3D tech works, and where Lassner’s work fits in
To follow the story, it helps to separate 3D into two jobs: rebuilding a scene and drawing it from new viewpoints. Lassner’s track record sits right at that intersection, with work that connects reconstruction, rendering, and neural methods.
Traditional 3D pipelines rely on explicit geometry (meshes), textures, and lighting. They can look amazing, but they often require time and skilled labor. Newer neural approaches learn 3D from images and video, then render new views with fewer manual steps.
Lassner’s projects reflect that shift. Pulsar targeted efficient GPU rendering and learning-friendly pipelines. Later work described as neural assets and neural 3D video points toward photorealistic assets from smartphone videos that can run interactively in engines, including tricky visual effects like hair, fur, and other volumetric details.
Here’s a quick way to keep the main parts straight:
| Part of the pipeline | What it does (plain English) | Why it matters |
|---|---|---|
| Reconstruction | Builds a 3D representation from photos or video | Without it, you can’t move the camera reliably |
| Rendering | Draws the scene from a chosen viewpoint | Speed decides whether it feels interactive |
| Neural rendering (radiance fields) | Learns how light and appearance change across views | Helps realism when meshes and textures fall short |
The takeaway is practical: if reconstruction is shaky, the world warps. If rendering is slow, creators won’t use it in everyday workflows.
Two steps that matter most: rebuilding the scene, then rendering it fast
Reconstruction means turning pixels into structure. Imagine you shoot a short phone video of your kitchen. Reconstruction tries to infer where the counters are, how far the wall is, and where the edges should be. The output might be a mesh, a point-based structure, or a learned field that stores appearance information.
Rendering comes next. Rendering is the act of drawing the scene from a new camera position, like when you move your phone and the view updates. In games, rendering must run many times per second, or you feel lag.
Speed matters because creators iterate. A designer might change lighting, move props, or test a new camera path. If each change takes minutes, the tool becomes a “special occasion” workflow. If it responds quickly, it becomes part of daily creative work.
This is also where Lassner’s mix of ML and graphics shows up. His work has repeatedly aimed at outputs that don’t just exist, but can run in interactive settings.
Neural rendering and radiance fields: a simple way to understand the hype
Neural Radiance Fields (NeRF-style methods) gave people a new mental model for 3D: instead of building a detailed mesh first, the system learns how a scene looks from many angles, then renders new views by predicting color and density along camera rays.
A simple analogy helps. Think of a room as a book you can’t open fully. You only see a few pages. A radiance field tries to infer the missing pages so the story still makes sense when you flip to a new spot.
Industry demos, including work shown publicly from Meta teams, helped popularize the idea that short videos can become realistic, viewable 3D. World Labs’ framing builds on that direction, but with a product goal: editable worlds that stay stable as you navigate.
None of this removes the hard parts. The model still needs to handle reflective surfaces, thin objects, and scenes with large occlusions. Even so, the trend is clear: neural rendering has made “3D from casual capture” feel more achievable than it did a few years ago.
What this could change next for games, film, and everyday apps
When 3D becomes easier to generate and edit, the biggest shift is time. Teams can try more ideas because the starting point arrives faster. That doesn’t remove craft, but it changes where effort goes, from building every wall by hand to shaping, polishing, and directing.
World Labs has shared demos and updates around Marble that point toward interactive room generation and editing. Some online references mention a September 2025 TEDAI talk with themes like generated rooms and interactive stories, but public confirmation is unclear. Still, the idea matches what World Labs and its founders have discussed elsewhere: worlds that you can navigate, adjust, and use as a stage for narratives or agents.
Because these systems sit between art and engineering, the most realistic near-term impact is hybrid workflows. People will use the generated 3D as a base layer, then refine it with standard tools.
For creators: faster world-building, smarter editing, and more interactive stories
For game and film creators, 3D generation could act like fast set building. Instead of starting from a blank scene, you start from a workable room, then adjust it. That helps with pre-visualization, blocking shots, and testing story beats.
Interactive storytelling is another angle. If a scene exists as a navigable space, viewers can explore it. A filmmaker could publish a short story as a room you can walk through, with details placed where you choose to look. That idea depends on consistency. If the room changes when you turn around, the magic breaks.
This is also where export matters. World Labs has described workflows that allow worlds to move into other environments. For creators, a 3D output that can’t travel is less useful than one that fits into familiar pipelines.
For users: more believable AR/VR, better 3D scanning, and helpful robots
For everyday users, spatial intelligence shows up when apps stop “floating” objects badly. AR can feel convincing only when it understands surfaces and scale. A lamp should sit on a table, not clip through it.
Better 3D scanning is another likely outcome. Phone-based capture already exists, but results can look noisy or incomplete. As models improve at filling gaps and stabilizing geometry, scans may look cleaner with less effort.
Robots also benefit from better 3D understanding. A robot doesn’t need pretty graphics, but it needs reliable geometry. If an AI can infer what’s behind a chair, it can plan safer paths. Over time, that could make robots more useful in cluttered homes and warehouses, where perfect maps are rare.
Conclusion
Christoph Lassner’s career has stayed focused on one challenge: turning images and video into practical 3D that people can use. World Labs takes that experience and pushes toward “spatial intelligence,” with Large World Models and products like Marble that aim to make explorable 3D scenes easier to create and edit. Watch for progress in three areas: fewer inputs needed for solid 3D, faster interactive rendering, and real product demos that work outside curated videos. What would you build first if a single photo could become a room you can edit?
- Better 3D from fewer inputs
- Faster interactive rendering in everyday tools
- More real demos that hold up under free camera movement
Related News:
Thailand’s Emerging Tech Trends for 2026: Smartphones, AI & Digital Lifestyle
Google AI Overviews Cite YouTube More Than Medical Websites




