Common Sense: On FlashWorld: High-quality 3D Scene Generation within Seconds

Saturday, January 24, 2026

On FlashWorld: High-quality 3D Scene Generation within Seconds

Good news!

"Detailed Text- or Image-to-3D, Pronto

Current methods that produce 3D scenes from text or images are slow and produce inconsistent results. Researchers introduced a technique that generates detailed, coherent 3D scenes in seconds.

What’s new: Researchers at Xiamen University, Tencent, and Fudan University developed FlashWorld, a generative model that takes a text description or image and produces a high-quality 3D scene, represented as Gaussian splats; that is, millions of colored, semi-transparent ellipsoids. ..."

From the abstract:

"We propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, 10~100 faster than previous works while possessing superior rendering quality. Our approach shifts from the conventional multi-view-oriented (MV-oriented) paradigm, which generates multi-view images for subsequent 3D reconstruction, to a 3D-oriented approach where the model directly produces 3D Gaussian representations during multi-view generation. While ensuring 3D consistency, 3D-oriented method typically suffers poor visual quality.

FlashWorld includes a dual-mode pre-training phase followed by a cross-mode post-training phase, effectively integrating the strengths of both paradigms. Specifically, leveraging the prior from a video diffusion model, we first pre-train a dual-mode multi-view diffusion model, which jointly supports MV-oriented and 3D-oriented generation modes. To bridge the quality gap in 3D-oriented generation, we further propose a cross-mode post-training distillation by matching distribution from consistent 3D-oriented mode to high-quality MV-oriented mode. This not only enhances visual quality while maintaining 3D consistency, but also reduces the required denoising steps for inference.

Also, we propose a strategy to leverage massive single-view images and text prompts during this process to enhance the model's generalization to out-of-distribution inputs. Extensive experiments demonstrate the superiority and efficiency of our method."

Self-Driving Reasoning Models, ChatGPT Adds Ads, Apple's Deal with Google, 3D Generation Pronto

FlashWorld: High-quality 3D Scene Generation within Seconds (open access)