LOP-Field: Brain-inspired
Layout-Object-Position Fields
for Robotic Scene Understanding
Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene.
Inspired by the representations of spatial layout in local scenes to encode different regions separately, we proposed LOP-Field that realizes the Layout-Object-Position(LOP) association to model the hierarchical representations for robotic scene understanding. Powered by foundation models and implicit scene representation, a neural field is implemented as a scene memory for robots, storing a queryable representation of scenes with position-wise, object-wise, and layout-wise information.
To validate the built LOP association, the model is tested to infer region information from 3D positions with quantitative metrics, achieving an average accuracy of more than 88\%. It is also shown that the proposed method using region information can achieve improved object and view localization results with text and RGB input compared to state-of-the-art localization methods.
Pipeline of the target embedding processing and neural implicit rendering during training. Above is the ground truth generation of layout-object-position vision-language and semantic embeddings for weakly-supervising. Below is the neural implicit network mapping 3D positions to target feature space. A contrastive loss is optimized against each other.
In this interactive demo, we show a heatmap of association between 3D points and the text and image queries powered by LOP association from trained LOP-Field.
@inproceedings{hou2024lop-field,
title={LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding},
author={Hou, Jiawei and Guan, Wenhao and Xue, Xiangyang and Zeng, Taiping},
booktitle = {arXiv},
year = {2024},
}