Publications | Shivendra Agrawal

2026

Ph.D. Thesis
Context-Aware Embodied AI Systems for Human-Centered Environments: From Assistive Guidance to Autonomous Robots

Shivendra Agrawal

University of Colorado Boulder, 2026

Abs Bib PDF

Real-world, human-centered environments are highly unstructured, governed not just by geometric constraints, but by semantic cues, social norms, dynamic conditions, and cluttered layouts. While existing robotic systems have achieved geometric competence, they lack the contextual awareness necessary to operate in public spaces such as stores, offices, libraries, and transit stations. This contextual gap limits the generalizability of autonomous agents and also poses a significant barrier for people with visual impairments, for whom much of this environmental information is only present visually and therefore hard to access directly. To bridge this gap, this dissertation develops context-aware embodied AI methods that interpret implicit environmental context for robust guidance and spatial grounding. Using assistive technology as a testbed, this work first establishes foundational methods for context-aware guidance by modeling social dynamics and learning from human hand movement for grasp guidance. The research then extracts spatial knowledge from multimodal data to equip broader autonomous systems with semantic understanding. It introduces novel semantic localization methods for quasi-static environments, initially through distributional semantic particle filtering, and then through vision-language model enhanced localization to resolve geometric aliasing. Finally, it presents a semantic-topology representation to support intent-aware search, zone classification, one-shot semantic localization, and the generation of natural language route instructions. Across these contributions, this thesis demonstrates that social, semantic, and spatial cues can improve robotic systems in environments where geometry alone is insufficient. Together, these systems lay the groundwork for deploying robust, socially intelligent systems capable of long-term, practical autonomy in human-centered spaces.
@phdthesis{agrawal2026dissertation, title = {Context-Aware Embodied AI Systems for Human-Centered Environments: From Assistive Guidance to Autonomous Robots}, author = {Agrawal, Shivendra}, school = {University of Colorado Boulder}, year = {2026}, }
ArXiv
VLM-GLoc: Vision-Language Model Enhanced Monte Carlo Localization for Robust Semantic Global Localization in Cluttered Quasi-Static Environments

Shivendra Agrawal, and Bradley Hayes

arXiv preprint arXiv:2605.30506, 2026

Abs Bib PDF

Global localization in geometrically aliased, quasi-static environments such as grocery stores, offices, schools, and hospitals poses a significant challenge for mobile robots. Grocery stores with parallel aisles and a long tailed distribution of products, as well as offices and labs with repetitive furniture such as chairs, desks, monitors, and doors, exemplify common indoor environments that present geometric and even semantic ambiguity. Traditional approaches rely either on distinct geometric features or on domain-specific vision pipelines that struggle with long-tail semantic distributions and transient visual clutter. We present VLM-GLoc, a method for hierarchical semantic Monte Carlo Localization (MCL) that leverages open-vocabulary Vision-Language Models (VLMs) as a unified semantic observation front-end. We hypothesize a three-fold benefit from VLMs: (1) extracting highly discriminative rich text features, (2) implicit quality filtering of blurry or dynamic objects, and (3) permanence reasoning for targeted data augmentation. We introduce an inverse semantic proposal mechanism that seeds particles via text-to-map retrieval. Evaluated across two real-world environments with different characteristics and two different platforms: a 3,500 sq. ft. grocery store with a cellphone and a 3,700 sq. ft. lab space with a quadruped, VLM-GLoc achieves 70% and 74% global localization success respectively, substantially outperforming traditional geometry-only and domain-specific baselines.
@article{agrawal2026vlmgloc, title = {VLM-GLoc: Vision-Language Model Enhanced Monte Carlo Localization for Robust Semantic Global Localization in Cluttered Quasi-Static Environments}, author = {Agrawal, Shivendra and Hayes, Bradley}, journal = {arXiv preprint arXiv:2605.30506}, year = {2026}, topic = {vlmgloc}, url = {https://arxiv.org/abs/2605.30506}, }
ArXiv
GIST - Multimodal Knowledge Extraction and Spatial Grounding

Shivendra Agrawal, and Bradley Hayes

arXiv preprint arXiv:2604.15495, 2026

Abs Bib PDF

Navigating complex, densely packed environments like retail stores, warehouses, and hospitals poses a significant spatial grounding challenge for humans and embodied AI. In these spaces, dense visual features quickly become stale given the quasi-static nature of items, and long-tail semantic distributions challenge traditional computer vision. While Vision-Language Models (VLMs) help assistive systems navigate semantically-rich spaces, they still struggle with spatial grounding in cluttered environments. We present GIST (Grounded Intelligent Semantic Topology), a multimodal knowledge extraction pipeline that transforms a consumer-grade mobile point cloud into a semantically annotated navigation topology. Our architecture distills the scene into a 2D occupancy map, extracts its topological layout, and overlays a lightweight semantic layer via intelligent keyframe and semantic selection. We demonstrate the versatility of this structured spatial knowledge through critical downstream Human-AI interaction tasks: (1) an intent-driven Semantic Search engine that actively infers categorical alternatives and zones when exact matches fail; (2) a one-shot Semantic Localizer achieving a 1.04 m top-5 mean translation error; (3) a Zone Classification module that segments the walkable floor plan into high-level semantic regions; and (4) a Visually-Grounded Instruction Generator that synthesizes egocentric, landmark-rich natural language routing. In multi-criteria LLM evaluations, GIST outperforms sequence-based instruction generation baselines. Finally, an in-situ formative evaluation (N=5) yields an 80% navigation success rate relying solely on verbal cues, validating the system’s capacity for universal design.
@article{agrawal2026gist, title = {GIST - Multimodal Knowledge Extraction and Spatial Grounding}, author = {Agrawal, Shivendra and Hayes, Bradley}, journal = {arXiv preprint arXiv:2604.15495}, year = {2026}, topic = {gist}, url = {https://arxiv.org/abs/2604.15495}, }

RA-L

ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors

Shivendra Agrawal, Jake Brawer, Ashutosh Naik, and 2 more authors

IEEE Robotics and Automation Letters, 2026

@article{11478317,
  author = {Agrawal, Shivendra and Brawer, Jake and Naik, Ashutosh and Roncone, Alessandro and Hayes, Bradley},
  journal = {IEEE Robotics and Automation Letters},
  title = {ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors},
  year = {2026},
  volume = {11},
  number = {6},
  pages = {6943-6950},
  keywords = {Filtering;Particle filters;Filters;Circuits and systems;Central Processing Unit;Circuits;Feedback;MIMICs;Millimeter wave integrated circuits;Monolithic integrated circuits;Localization;semantic scene understanding;SLAM},
  doi = {10.1109/LRA.2026.3682613},
  topic = {shelfmcl},
  url = {https://arxiv.org/abs/2512.09065},
}

ICRA
Distributional Semantics for Robust Global Localization in Cluttered, Geometrically Aliased Environments

Shivendra Agrawal, Alessandro Roncone, and Bradley Hayes

In ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction, 2026

Abs Bib PDF

Many indoor workspaces such as warehouses, laboratories, and retail spaces are quasi-static: their global geometric layout remains permanent, but local semantics change continually. These spaces often have repetitive geometry, dynamic clutter, and perceptual noise that makes standard vision-based localization brittle. We present Distributional Semantic Monte Carlo Localization (DS-MCL), a particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed-quantity landmarks. DS-MCL fuses a geometric depth likelihood with a category-centric semantic similarity, utilizing a precomputed bank of semantic viewpoints to perform inverse semantic proposals for fast, targeted hypothesis generation on low-cost hardware. We evaluate DS-MCL across two environments. In a rigorously controlled quasi-static environment, DS-MCL achieves a 97% global localization success rate, heavily outperforming geometric and fixed-semantic baselines. Furthermore, in a 3,500 sq. ft. operational retail store, leveraging an open-vocabulary vision pipeline, DS-MCL significantly outperforms fixed-quantity baselines (62% vs 42% success). By modeling semantics distributionally, DS-MCL resolves geometric aliasing and provides an infrastructure-free building block for reliable autonomy in dynamic real-world environments.
@inproceedings{agrawal2026distributional, title = {Distributional Semantics for Robust Global Localization in Cluttered, Geometrically Aliased Environments}, author = {Agrawal, Shivendra and Roncone, Alessandro and Hayes, Bradley}, booktitle = {ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction}, year = {2026}, topic = {shelfmcl}, url = {https://openreview.net/forum?id=FjjDqommMK}, }

2023

AAMAS
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane

Shivendra Agrawal, Suresh Nayak, Ashutosh Naik, and 1 more author

In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Abs Bib PDF Website

The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. Through this work, we present a proof-of-concept socially assistive robotic system we call ShelfHelp, and propose novel technical solutions for enhancing instrumented canes traditionally meant for navigation tasks with additional capability within the domain of shopping. ShelfHelp includes a novel visual product locator algorithm designed for use in grocery stores and a novel planner that autonomously issues verbal manipulation guidance commands to guide the user during product retrieval. Through a human subjects study, we show the system’s success in locating and providing effective manipulation guidance to retrieve desired products with novice users. We compare two autonomous verbal guidance modes achieving comparable performance to a human assistance baseline and present encouraging findings that validate our system’s efficiency and effectiveness and through positive subjective metrics including competence, intelligence, and ease of use.
@inproceedings{agrawal2022shelfhelp, title = {ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane}, author = {Agrawal, Shivendra and Nayak, Suresh and Naik, Ashutosh and Hayes, Bradley}, booktitle = {Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems}, pages = {1514--1523}, topic = {shelfhelp}, url = {https://shivendraagrawal.github.io/projects/shelfhelp/}, year = {2023}, }

2022

IROS
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane

Shivendra Agrawal, and Bradley Hayes

In IROS 2022 SCIAR Workshop, 2022

Abs Bib PDF Website

The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. In this work we present our work-in-progress investigating technical solutions for enhancing instrumented canes traditionally meant for navigation tasks with capability within the domain of shopping. Our system includes a novel visual product search algorithm designed for use in the wild and a novel planner that autonomously issues verbal commands to guide the user in a reaching task to acquire them.
@inproceedings{agrawal2022shelf, title = {ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane}, author = {Agrawal, Shivendra and Hayes, Bradley}, booktitle = {IROS 2022 SCIAR Workshop}, topic = {shelfhelp}, url = {https://shivendraagrawal.github.io/assets/pdf/shelfhelp_iros22_workshop.pdf}, year = {2022}, }
IROS
A Novel Perceptive Robotic Cane with Haptic Navigation for Enabling Vision-Independent Participation in the Social Dynamics of Seat Choice

Shivendra Agrawal, Mary Etta West, and Bradley Hayes

In Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems, 2022

Abs Bib PDF Website

Goal-based navigation in public places is critical for independent mobility and for breaking barriers that exist for blind or visually impaired (BVI) people in a sight-centric society. Through this work we present a proof-of-concept system that autonomously leverages goal-based navigation assistance and perception to identify socially preferred seats and safely guide its user towards them in unknown indoor environments. The robotic system includes a camera, an IMU, vibrational motors, and a white cane, powered via a backpack-mounted laptop. The system combines techniques from computer vision, robotics, and motion planning with insights from psychology to perform 1) SLAM and object localization, 2) goal disambiguation and scoring, and 3) path planning and guidance. We introduce a novel 2-motor haptic feedback system on the cane’s grip for navigation assistance. Through a pilot user study, we show that the system is successful in classifying and providing haptic navigation guidance to socially preferred seats, while optimizing for users convenience, privacy, and intimacy in addition to increasing their confidence in independent navigation. The implications are encouraging as this technology, with careful design guided by the BVI community, can be adopted and further developed to be used with medical devices enabling the BVI population to better independently engage in socially dynamic situations like seat choice.
@inproceedings{agrawal2022novel, title = {A Novel Perceptive Robotic Cane with Haptic Navigation for Enabling Vision-Independent Participation in the Social Dynamics of Seat Choice}, author = {Agrawal, Shivendra and West, Mary Etta and Hayes, Bradley}, booktitle = {Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems}, topic = {social_guidance}, year = {2022}, url = {http://www.cairo-lab.com/papers/iros22.pdf}, }

2019

HRI
Explanation-based reward coaching to improve human performance via reinforcement learning

Aaquib Tabrez, Shivendra Agrawal, and Bradley Hayes

In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019

Awarded Abs Bib PDF Website

Best Paper Runner-up

For robots to effectively collaborate with humans, it is critical to establish a shared mental model amongst teammates. In the case of incongruous models, catastrophic failures may occur unless mitigating steps are taken. To identify and remedy these potential issues, we propose a novel mechanism for enabling an autonomous system to detect model disparity between itself and a human collaborator, infer the source of the disagreement within the model, evaluate potential consequences of this error, and finally, provide human-interpretable feedback to encourage model correction. This process effectively enables a robot to provide a human with a policy update based on perceived model disparity, reducing the likelihood of costly or dangerous failures during joint task execution. This paper makes two contributions at the intersection of explainable AI (xAI) and human-robot collaboration: 1) The Reward Augmentation and Repair through Explanation (RARE) framework for estimating task understanding and 2) A human subjects study illustrating the effectiveness of reward augmentation-based policy repair in a complex collaborative task.
@inproceedings{tabrez2019explanation, title = {Explanation-based reward coaching to improve human performance via reinforcement learning}, author = {Tabrez, Aaquib and Agrawal, Shivendra and Hayes, Bradley}, booktitle = {2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)}, pages = {249--257}, year = {2019}, organization = {IEEE}, topic = {explainable_ai}, url = {http://www.cairo-lab.com/papers/hri19.pdf}, }