https://arxiv.org/pdf/2211.15654.pdf

https://arxiv.org/pdf/2211.15654.pdf

  1. Introduction

pre-trained text,image embedding models → 3D-scene Understanding

Untitled

  1. Related work
  1. Methodology

Overview

Untitled

3.1. Image Feature Fusion

Segmentation model:

Untitled

LSeg, OpenSeg

Input: RGB Image with resolution H x W