
AI-Generated 3D Objects: Challenges and Breakthroughs Compared to 2D Models
The Computerphile video explores the generation of 3D objects using AI, comparing the challenges with 2D models. 2D models like Stable Diffusion rely on diffusion (gradual noise reduction guided by a prompt) and leverage datasets containing billions of images, enabling the combination of abstract concepts (e.g., "a frog on stilts"). In 3D, however, datasets are limited (only a few million samples), and generation must maintain consistency from all angles, complicating the process. The pioneering solution Dream Fusion (2022) uses 2D models to generate 3D objects via score distillation sampling: a 3D model (e.g., NeRF or Gaussian Splatting) is optimized by iterating over 2D renderings from multiple angles, guided by a diffusion model. Issues like the Janus effect (multiple faces) arise when the model interprets each angle as a frontal view. To address this, interpolated CLIP embeddings (e.g., "frog from the front" vs. "from the back") are used, along with decreasing noise to refine the structure. Demonstrations show imperfect results (e.g., frogs with extra antennas or limbs), but recent improvements like multi-view latent diffusion models have been mentioned.