LOP-Field: Brain-inspired

Layout-Object-Position Fields

for Robotic Scene Understanding

Fudan University

Dividing the scene information into layout, object, and position, and modeling them explicitly, layout-object-position association enables robots to address relative problems and realize a more comprehensive spatial cognition.

Abstract

Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene.

Inspired by the representations of spatial layout in local scenes to encode different regions separately, we proposed LOP-Field that realizes the Layout-Object-Position(LOP) association to model the hierarchical representations for robotic scene understanding. Powered by foundation models and implicit scene representation, a neural field is implemented as a scene memory for robots, storing a queryable representation of scenes with position-wise, object-wise, and layout-wise information.

To validate the built LOP association, the model is tested to infer region information from 3D positions with quantitative metrics, achieving an average accuracy of more than 88\%. It is also shown that the proposed method using region information can achieve improved object and view localization results with text and RGB input compared to state-of-the-art localization methods.

Method Overview

Pipeline of the target embedding processing and neural implicit rendering during training. Above is the ground truth generation of layout-object-position vision-language and semantic embeddings for weakly-supervising. Below is the neural implicit network mapping 3D positions to target feature space. A contrastive loss is optimized against each other.

Interactive Demonstration (Coming Soon)

In this interactive demo, we show a heatmap of association between 3D points and the text and image queries powered by LOP association from trained LOP-Field.

Text Query Localization Results on Datasets

Image Query Localization Results on Datasets

More Results on Datasets

BibTeX


        @inproceedings{hou2024lop-field,
            title={LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding},
            author={Hou, Jiawei and Guan, Wenhao and Xue, Xiangyang and Zeng, Taiping},
            booktitle = {arXiv},
            year = {2024},
        }