Thursday, March 21st, 2024 Posted by Jim Thacker

Check out LATTE3D, NVIDIA’s peppy new text-to-3D AI model


Nvidia has posted a demo video for LATTE3D, its new AI model for generating textured models of real-world objects from simple text prompts.

According to NVIDIA, LATTE3D can produce 3D shapes “near instantly” even when running a single previous-generation RTX A6000 GPU.

How is LATTE3D better than previous text-to-3D AI models?
LATTE3D (Large-scale Amortized Text-To-Enhanced3D Synthesis) is NVIDIA’s latest text-to-3D AI model: its third in a year, following on from Magic3D and ATT3D.

Each has improved on the previous model, increasing training speed, then output quality.

With ATT3D, NVIDIA began training on multiple text prompts as well as multiple 3D assets, to account for the different ways in which a user might describe the object to recreate.

The approach speeds up training over training on prompts individually, as was the case with Magic3D.

LATTE3D also uses multiple prompts – for the work, NVIDIA generated a set of 100,000 possible prompts using ChatGPT – but improves the visual quality of the assets generated.

How good are the 3D assets that LATTE3D generates?
If you compare the demo assets generated by ATT3D and LATTE3D, the output from LATTE3D is noticeably crisper and more detailed.

They’re still relatively low-resolution, but they’re getting to the point at which they could be used to block out a scene, or even be used as background assets.

How significant is LATTE3D for 3D artists?
LATTE3D is primarily a proof of concept: NVIDIA hasn’t released the source code, and the model was only trained for two specific types of asset: animals and everyday objects.

What is more significant is what it shows about the speed at which text-to-3D is evolving – and by extension, how soon usable publicly available text-to-3D services might arrive.

At NVIDIA’s GTC 2024 conference, Sanja Fidler, the firm’s Vice President of AI Research, admitted that quality is “not yet near what an artist would create”, but pointed out how far things have come since Google announced its pioneering DreamFusion model in late 2022.

“A year ago, it took an hour for AI models to generate 3D visuals of this quality — and the current state of the art is now around 10 to 12 seconds,” she said.

“We can now produce results an order of magnitude faster, putting near-real-time text-to-3D generation within reach for creators across industries.”

Read more about new text-to-3D AI model LATTE3D on Nvidia’s blog


Have your say on this story by following CG Channel on Facebook, Instagram and X (formerly Twitter). As well as being able to comment on stories, followers of our social media accounts can see videos we don’t post on the site itself, including making-ofs for the latest VFX movies, animations, games cinematics and motion graphics projects.