Friday, December 19, 2025

Microsoft O-Voxel generates high-quality 3D shapes with realistic geometry

 This could be a very interesting research paper by Microsoft!

"Voxel generates high-quality 3D shapes with realistic geometry

Microsoft developed O-Voxel, a new representation that captures both shape and appearance for 3D generation.
Unlike existing methods that struggle with open surfaces and complex structures, O-Voxel handles arbitrary topology including non-watertight meshes and enclosed interior geometry.
It encodes physically-based rendering properties—base color, metallic ratio, roughness, and opacity—directly aligned with the geometry.
The system compresses 1024³ resolution textured assets into 9,600 tokens using a variational autoencoder with 16× spatial downsampling.

Researchers
trained 4 billion parameter models on 800,000 public 3D assets.

Generation runs in 3 seconds at 512³ resolution and 17 seconds at 1024³ on an H100 GPU. In user studies, participants preferred the method’s outputs 66.5 percent of the time over existing approaches, citing better detail and realism."

From the abstract:
"Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with complex topologies and detailed appearance.
This paper present an approach for learning a structured latent representation from native 3D data to address this challenge. At its core is a new sparse voxel structure called O-Voxel, an omni-voxel representation that encodes both geometry and appearance.
O-Voxel can robustly model arbitrary topology, including open, non-manifold, and fully-enclosed surfaces, while capturing comprehensive surface attributes beyond texture color, such as physically-based rendering parameters. Based on O-Voxel, we design a Sparse Compression VAE [variational autoencoder] which provides a high spatial compression rate and a compact latent space.
We train large-scale flow-matching models comprising 4B parameters for 3D generation using diverse public 3D asset datasets.
Despite their scale, inference remains highly efficient. Meanwhile, the geometry and material quality of our generated assets far exceed those of existing models. We believe our approach offers a significant advancement in 3D generative modeling."

Credits: Data Points newsletter

Native and Compact Structured Latents for 3D Generation (open access)








No comments: