Publications

Research Publications

My research focuses on perception, representation, and planning for robot manipulation. I work at the intersection of computer vision, machine learning, and robotics to develop systems that can better understand and interact with their environment.

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa
In Submission
A comprehensive evaluation framework for robotic manipulation tasks with structured and scalable assessment methods.
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar, Yijie Guo
International Conference on Learning Representations (ICLR)
A vision-language model for detecting and reasoning over failures in robotic manipulation tasks.
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan
International Conference on Machine Learning (ICML)
Integration of visual foundation models with memory architecture for enhanced robotic manipulation capabilities.
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Long Cheng, Jiafei Duan, Yi Ru Wang, Haoquan Fang, Boyang Li, Yushan Huang, Elvis Wang, Ainaz Eftekhar, Jason Lee, Wentao Yuan, Rose Hendrix, Noah A. Smith, Fei Xia, Dieter Fox, Ranjay Krishna
In Submission
A framework for probing multimodal grounding through language-guided pointing interactions.
MolmoAct: Action Reasoning Models that can Reason in Space

MolmoAct: Action Reasoning Models that can Reason in Space

Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna
arXiv preprint
Action reasoning models with spatial reasoning capabilities for robotic manipulation tasks.
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Jiafei Duan*, Wentao Yuan*, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna
Conference on Robot Learning (CoRL)
Automation of real-world robots using vision-language models for manipulation tasks.
M

MVTrans: Multi-View Perception for Transparent Objects

Yi Ru Wang*, Yuchi Zhao*, Haoping Xu*, Alan Aspuru-Guzik, Florian Shkurti, Animesh Garg
IEEE International Conference on Robotics and Automation (ICRA)
Multi-view perception techniques for transparent object detection and manipulation.
AR2D2: Training a Robot Without A Robot

AR2D2: Training a Robot Without A Robot

Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna
Conference on Robot Learning (CoRL)
A novel approach for training robotic systems without requiring physical robot hardware.
NEWTON: Are Large Language Models Capable of Physical Reasoning?

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Investigation of large language models capabilities for physical reasoning tasks.
C

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Yi Ru Wang*, Samir Khaki*, Weihang Zheng*, Mahdi S. Hosseini*, Konstantinos N. Plataniotis
International Conference on Machine Learning and Applications (ICMLA) - Oral Presentation
Efficient auto-channel size optimization techniques for convolutional neural networks.
S

Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects

Yi Ru Wang*, Haoping Xu*, Sagi Eppel, Alan Aspuru-Guzik, Florian Shkurti, Animesh Garg
5th Annual Conference on Robot Learning (CoRL) - Oral Presentation (6.5% Acceptance Rate)
Joint point-cloud and depth completion techniques for transparent object perception.
P

Predicting 3d shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the Transproteus CGI dataset

Sagi Eppel, Haoping Xu, Yi Ru Wang, Alan Aspuru-Guzik
In Submission
Prediction of 3D shapes, masks, and properties of materials inside transparent containers using the Transproteus CGI dataset.