Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Predicting 3d shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the Transproteus CGI dataset

Published in In Submission, 2021

Prediction of 3D shapes, masks, and properties of materials inside transparent containers using the Transproteus CGI dataset.

Recommended citation: Eppel, S., Xu, H., Wang, Y. R., & Aspuru-Guzik, A. (2021). Predicting 3d shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the Transproteus CGI dataset. In Submission. https://arxiv.org/abs/2109.07577

Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects

Published in 5th Annual Conference on Robot Learning (CoRL) - Oral Presentation (6.5% Acceptance Rate), 2021

Joint point-cloud and depth completion techniques for transparent object perception.

Recommended citation: Wang, Y. R.*, Xu, H.*, Eppel, S., Aspuru-Guzik, A., Shkurti, F., & Garg, A. (2021). Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects. 5th Annual Conference on Robot Learning (CoRL). https://arxiv.org/abs/2110.00087

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Published in International Conference on Machine Learning and Applications (ICMLA) - Oral Presentation, 2021

Efficient auto-channel size optimization techniques for convolutional neural networks.

Recommended citation: Wang, Y. R.*, Khaki, S.*, Zheng, W.*, Hosseini, M. S.*, & Plataniotis, K. N. (2021). CONetV2: Efficient Auto-Channel Size Optimization for CNNs. International Conference on Machine Learning and Applications (ICMLA). https://arxiv.org/abs/2110.06830

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Published in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Investigation of large language models capabilities for physical reasoning tasks.

Recommended citation: Wang, Y. R., Duan, J., Fox, D., & Srinivasa, S. (2023). NEWTON: Are Large Language Models Capable of Physical Reasoning? Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2310.07018

AR2D2: Training a Robot Without A Robot

Published in Conference on Robot Learning (CoRL), 2023

A novel approach for training robotic systems without requiring physical robot hardware.

Recommended citation: Duan, J., Wang, Y. R., Shridhar, M., Fox, D., & Krishna, R. (2023). AR2D2: Training a Robot Without A Robot. Conference on Robot Learning (CoRL). https://arxiv.org/abs/2306.13818

MVTrans: Multi-View Perception for Transparent Objects

Published in IEEE International Conference on Robotics and Automation (ICRA), 2023

Multi-view perception techniques for transparent object detection and manipulation.

Recommended citation: Wang, Y. R.*, Zhao, Y.*, Xu, H.*, Aspuru-Guzik, A., Shkurti, F., & Garg, A. (2023). MVTrans: Multi-View Perception for Transparent Objects. IEEE International Conference on Robotics and Automation (ICRA). https://arxiv.org/abs/2302.11683

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Published in Conference on Robot Learning (CoRL), 2024

Automation of real-world robots using vision-language models for manipulation tasks.

Recommended citation: Duan, J.*, Yuan, W.*, Pumacay, W., Wang, Y. R., Ehsani, K., Fox, D., & Krishna, R. (2024). Manipulate-Anything: Automating Real-World Robots using Vision-Language Models. Conference on Robot Learning (CoRL). https://arxiv.org/abs/2406.18915

MolmoAct: Action Reasoning Models that can Reason in Space

Published in arXiv preprint, 2025

Action reasoning models with spatial reasoning capabilities for robotic manipulation tasks.

Recommended citation: Lee, J., Duan, J., Fang, H., Deng, Y., Liu, S., Li, B., Fang, B., Zhang, J., Wang, Y. R., Lee, S., Han, W., Pumacay, W., Wu, A., Hendrix, R., Farley, K., VanderBilt, E., Farhadi, A., Fox, D., & Krishna, R. (2025). MolmoAct: Action Reasoning Models that can Reason in Space. arXiv preprint. https://arxiv.org/abs/2508.07917

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Published in In Submission, 2025

A framework for probing multimodal grounding through language-guided pointing interactions.

Recommended citation: Cheng, L., Duan, J., Wang, Y. R., Fang, H., Li, B., Huang, Y., Wang, E., Eftekhar, A., Lee, J., Yuan, W., Hendrix, R., Smith, N. A., Xia, F., Fox, D., & Krishna, R. (2025). PointArena: Probing Multimodal Grounding Through Language-Guided Pointing. In Submission. https://arxiv.org/abs/2505.09990

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Published in International Conference on Machine Learning (ICML), 2025

Integration of visual foundation models with memory architecture for enhanced robotic manipulation capabilities.

Recommended citation: Fang, H., Grotz, M., Pumacay, W., Wang, Y. R., Fox, D., Krishna, R., & Duan, J. (2025). SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation. International Conference on Machine Learning (ICML). https://arxiv.org/abs/2501.18564

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Published in International Conference on Learning Representations (ICLR), 2025

A vision-language model for detecting and reasoning over failures in robotic manipulation tasks.

Recommended citation: Duan, J., Pumacay, W., Kumar, N., Wang, Y. R., Tian, S., Yuan, W., Krishna, R., Fox, D., Mandlekar, A., & Guo, Y. (2025). AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2410.00371

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

Published in In Submission, 2025

A comprehensive evaluation framework for robotic manipulation tasks with structured and scalable assessment methods.

Recommended citation: Wang, Y. R., Ung, C., Tannert, G., Duan, J., Li, J., Le, A., Oswal, R., Grotz, M., Pumacay, W., Deng, Y., Krishna, R., Fox, D., & Srinivasa, S. (2025). RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation. In Submission. https://arxiv.org/abs/2507.00435

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015