Researchers at Carnegie Mellon University have proposed a novel method called Diff2Scene for open-vocabulary 3D semantic segmentation and visual grounding tasks. The method leverages frozen representations from text-image generative models, eliminating the need for labeled 3D data.
Discussion
No replies yet.