Team members:


We are working towards making an end-to-end system that can assist with independent grocery shopping as shopping with sighted guide is prohibitive and often causes a loss of privacy. Grocery shopping primarily consists of three main subtasks: navigation, product retrieval, and product examination. Our current work focuses on product retrieval. ShelfHelp can locate items on the shelf and verbally provide fine-grain manipulation guidance to help people retrieve the desired item from an aisle.

With our system, the blindfloded user is able to retrieve the desired product from the shelf.

Video with sound (recommended version)

ShelfHelp includes a robotic cane equipped with RealSense D455 and T265 cameras. The system is powered through a laptop in a backpack. Left: The system used as a navigational device. It uses audio and haptic feedback for navigation guidance. Right: The system used as a manipulation device. It uses audio for manipulation guidance.

Poster

System Diagram. Alignment, perception, planning, and verbal conveyance are executed on a backpack-worn laptop, while all the sensing is mounted on the cane.

ShelfHelp uses a novel 2-stage CV pipeline to locate products on the shelf. The pipeline doesn’t need any re-training. In the first stage, we use a YoloV5 network that we train on the SKU-110K dataset, giving us the most likely bounding boxes to contain any product. In the second stage, we take these most likely regions and compare them against the image of the desired product. We freeze the weights of an autoencoder that we train on the MS-COCO dataset and take just the encoder portion to extract features from the desired product image and the proposed regions (the feature vectors are compared using cosine similarity). For each new image added to the database, we pass it through the frozen encoder and save the generated feature vector on disk. This step again doesn’t require any re-training of the autoencoder.

Our product search algorithm can reliably locate desired products on a grocery shelf. Regions with a high likelihood of containing any product are proposed in the first stage. The features of these regions are then compared against the target product image. Our data association solution is used to identify whether detections from incoming camera frames are new or re-detections of existing products. The product classification aspect of this work has been tested and validated in actual grocery stores, whereas the data association and manipulation assistance components were validated within a lab-based study.
(Left to right) A sample of discrete commands. The movement (in meters) each command caused. MDP and solution definition. We train a model of human hand movement from demonstrations that inform the transition probabilities T. S defines the state space, A defines the discrete set of verbal actions, and R is the reward function. A policy is learned offline that can be used across reaching tasks.

Papers

1. AAMAS
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane
In 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2023
2. IROS
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane
In IROS 2022 SCIAR Workshop 2022