GLS: Geometry-aware 3D Language Gaussian Splatting

Jiaxiong Qiu, Liu Liu, Xinjie Wang, Tianwei Lin, Wei Sui, Zhizhong Su

Horizon Robotics, Beijing, China and D-Robotics, Beijing, China

Abstract

Recently, 3D Gaussian Splatting (3DGS) has achieved impressive performance on indoor surface reconstruction and 3D open-vocabulary segmentation. This paper presents GLS, a unified framework of 3D surface reconstruction and open-vocabulary segmentation based on 3DGS. GLS extends two fields by improving their sharpness and smoothness. For indoor surface reconstruction, we introduce surface normal prior as a geometric cue to guide the rendered normal, and use the normal error to optimize the rendered depth. For 3D open-vocabulary segmentation, we employ 2D CLIP features to guide instance features and enhance the surface smoothness, then utilize DEVA masks to maintain their view consistency. Extensive experiments demonstrate the effectiveness of jointly optimizing surface reconstruction and 3D open-vocabulary segmentation, where GLS surpasses state-of-the-art approaches of each task on MuSHRoom, ScanNet++, and LERF-OVS datasets.

Method Overview

Given multi-view RGB images captured by a camera in an indoor scene, our goal is to jointly reconstruct the scene and open-vocabulary objects. To achieve this goal, we introduce GLS, a novel framework based on 3DGS. As shown in~\figref{fig:pip}, our framework consists of three procedures. In the input procedure, we use the generalizable model SAM, DEVA and CLIP to produce 2D consistent semantic masks and object-level features. Then we adopt the generalizable model of surface normal estimation to acquire the geometric cue. In the optimization procedure, we utilize the semantic and normal priors for regularization. We first follow previous approaches to regularize the rendered color, depth and semantic feature. Then we propose a novel smoothness term to tackle texture-less regions and a novel constraint by analyzing the normal error of Gaussians to refine object structures. In the inference procedure, our model reconstructs the indoor surface and selects the target object by the open-vocabulary text simultaneously.

Prompt: Left: "I want to make a piece of toast" Right: "I want to cut a tomato"

Prompt: Left: "I want to wash my hands" Right: "I want to store my ice cream"

@article{qiu2024gls, title={GLS: Geometry-aware 3D Language Gaussian Splatting}, author={Qiu, Jiaxiong and Liu, Liu and Wang, Xinjie and Lin, Tianwei and Sui, Wei and Su, Zhizhong}, journal={arXiv preprint arXiv:2411.18066}, year={2024} }

GLS: Geometry-aware 3D Language Gaussian Splatting

Abstract

Method Overview

Application videos

Comparison with previous methods

BibTeX