ShelfAware - Real-Time Semantic Localization in Quasi-Static Environments

Team members:
Shivendra Agrawal, Jake Brawer, Ashutosh Naik, Alessandro Roncone, Bradley Hayes

2026

RA-L

ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors

Shivendra Agrawal, Jake Brawer, Ashutosh Naik, and 2 more authors

IEEE Robotics and Automation Letters, 2026

DOI Bib PDF

@article{11478317,
  author = {Agrawal, Shivendra and Brawer, Jake and Naik, Ashutosh and Roncone, Alessandro and Hayes, Bradley},
  journal = {IEEE Robotics and Automation Letters},
  title = {ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors},
  year = {2026},
  volume = {11},
  number = {6},
  pages = {6943-6950},
  keywords = {Filtering;Particle filters;Filters;Circuits and systems;Central Processing Unit;Circuits;Feedback;MIMICs;Millimeter wave integrated circuits;Monolithic integrated circuits;Localization;semantic scene understanding;SLAM},
  doi = {10.1109/LRA.2026.3682613},
  topic = {shelfmcl},
  url = {https://arxiv.org/abs/2512.09065},
}

ICRA
Distributional Semantics for Robust Global Localization in Cluttered, Geometrically Aliased Environments

Shivendra Agrawal, Alessandro Roncone, and Bradley Hayes

In ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction, 2026

Abs Bib PDF

Many indoor workspaces such as warehouses, laboratories, and retail spaces are quasi-static: their global geometric layout remains permanent, but local semantics change continually. These spaces often have repetitive geometry, dynamic clutter, and perceptual noise that makes standard vision-based localization brittle. We present Distributional Semantic Monte Carlo Localization (DS-MCL), a particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed-quantity landmarks. DS-MCL fuses a geometric depth likelihood with a category-centric semantic similarity, utilizing a precomputed bank of semantic viewpoints to perform inverse semantic proposals for fast, targeted hypothesis generation on low-cost hardware. We evaluate DS-MCL across two environments. In a rigorously controlled quasi-static environment, DS-MCL achieves a 97% global localization success rate, heavily outperforming geometric and fixed-semantic baselines. Furthermore, in a 3,500 sq. ft. operational retail store, leveraging an open-vocabulary vision pipeline, DS-MCL significantly outperforms fixed-quantity baselines (62% vs 42% success). By modeling semantics distributionally, DS-MCL resolves geometric aliasing and provides an infrastructure-free building block for reliable autonomy in dynamic real-world environments.
@inproceedings{agrawal2026distributional, title = {Distributional Semantics for Robust Global Localization in Cluttered, Geometrically Aliased Environments}, author = {Agrawal, Shivendra and Roncone, Alessandro and Hayes, Bradley}, booktitle = {ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction}, year = {2026}, topic = {shelfmcl}, url = {https://openreview.net/forum?id=FjjDqommMK}, }

Abstract

Localization in dynamic, visually aliased environments like grocery stores is a difficult challenge for autonomous systems. Aisles often look identical geometrically, and stock changes frequently. ShelfAware is a novel semantic localization framework that achieves robust, real-time global localization using only low-cost sensors—specifically, a monocular RGB-D camera on a smartphone. Unlike traditional methods that rely on expensive LiDAR or detailed geometric maps, ShelfAware uses a Semantic Particle Filter. It leverages Visual-Inertial Odometry (VIO) for motion estimation and corrects drift by matching detected product categories (e.g., “cereal”, “soda”) against a lightweight semantic map. This allows the system to localize accurately even in featureless or repetitive aisles.

The ShelfAware Approach

The core of our solution is a particle filter that fuses visual-inertial odometry with semantic observations.

Semantic Mapping: We utilize a pre-built semantic map that stores the locations of product categories rather than individual items, making the map robust to daily stock changes.
Observation Model: As the agent moves, a custom YOLO-based detector identifies product categories in the camera frame.
Particle Update: We project these detections into 3D space. Particles that “expect” to see the detected products at their hypothesized location receive higher weights, while those that don’t are penalized. This effectively converges the particle cloud to the robot’s true location.

Modularity

Mountable on Carts/Strollers: Can add autonomous capabilities to existing equipment.
Wearable: Can support assistive technology for navigation.

Algorithm

Semantic Mapping

We trained a custom classifier to classify products into a fixed number of classes.

Pose Correction

Real-world pose estimates obtained through inverse camera projection are refined using ray casting on the semantic map.

Semantic Localization

Semantic information is fused with the depth observation to in a Monte Carlo Localization framework.

Demo

Experimental Results

We evaluated ShelfAware in a semantically dense retail environment (a mock store) across diverse conditions, including cart-mounted and wearable setups. The system achieved a 96% global localization success rate with a mean time-to-convergence of 1.91s, significantly outperforming geometric baselines like MCL (22% success) and AMCL (10% success). The system operates in real-time (9.6Hz) on consumer laptop-class hardware, demonstrating robust tracking even in dynamic, visually aliased aisles.

Ground truth trajectories shown in blue and trajectories predicted by ShelfAware shown in red.

@ARTICLE{11478317,
  author={Agrawal, Shivendra and Brawer, Jake and Naik, Ashutosh and Roncone, Alessandro and Hayes, Bradley},
  journal={IEEE Robotics and Automation Letters}, 
  title={ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors}, 
  year={2026},
  volume={11},
  number={6},
  pages={6943-6950},
  keywords={Filtering;Particle filters;Filters;Circuits and systems;Central Processing Unit;Circuits;Feedback;MIMICs;Millimeter wave integrated circuits;Monolithic integrated circuits;Localization;semantic scene understanding;SLAM},
  doi={10.1109/LRA.2026.3682613}
}

2026

Abstract

The ShelfAware Approach

Algorithm

Semantic Mapping

Pose Correction

Semantic Localization

Demo

Experimental Results