Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in In Submission, 2021
Prediction of 3D shapes, masks, and properties of materials inside transparent containers using the Transproteus CGI dataset.
Recommended citation: Eppel, S., Xu, H., Wang, Y. R., & Aspuru-Guzik, A. (2021). Predicting 3d shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the Transproteus CGI dataset. In Submission. https://arxiv.org/abs/2109.07577
Published in 5th Annual Conference on Robot Learning (CoRL) - Oral Presentation (6.5% Acceptance Rate), 2021
Joint point-cloud and depth completion techniques for transparent object perception.
Recommended citation: Wang, Y. R.*, Xu, H.*, Eppel, S., Aspuru-Guzik, A., Shkurti, F., & Garg, A. (2021). Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects. 5th Annual Conference on Robot Learning (CoRL). https://arxiv.org/abs/2110.00087
Published in International Conference on Machine Learning and Applications (ICMLA) - Oral Presentation, 2021
Efficient auto-channel size optimization techniques for convolutional neural networks.
Recommended citation: Wang, Y. R.*, Khaki, S.*, Zheng, W.*, Hosseini, M. S.*, & Plataniotis, K. N. (2021). CONetV2: Efficient Auto-Channel Size Optimization for CNNs. International Conference on Machine Learning and Applications (ICMLA). https://arxiv.org/abs/2110.06830
Published in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Investigation of large language models capabilities for physical reasoning tasks.
Recommended citation: Wang, Y. R., Duan, J., Fox, D., & Srinivasa, S. (2023). NEWTON: Are Large Language Models Capable of Physical Reasoning? Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2310.07018
Published in Conference on Robot Learning (CoRL), 2023
A novel approach for training robotic systems without requiring physical robot hardware.
Recommended citation: Duan, J., Wang, Y. R., Shridhar, M., Fox, D., & Krishna, R. (2023). AR2D2: Training a Robot Without A Robot. Conference on Robot Learning (CoRL). https://arxiv.org/abs/2306.13818
Published in IEEE International Conference on Robotics and Automation (ICRA), 2023
Multi-view perception techniques for transparent object detection and manipulation.
Recommended citation: Wang, Y. R.*, Zhao, Y.*, Xu, H.*, Aspuru-Guzik, A., Shkurti, F., & Garg, A. (2023). MVTrans: Multi-View Perception for Transparent Objects. IEEE International Conference on Robotics and Automation (ICRA). https://arxiv.org/abs/2302.11683
Published in Conference on Robot Learning (CoRL), 2024
Automation of real-world robots using vision-language models for manipulation tasks.
Recommended citation: Duan, J.*, Yuan, W.*, Pumacay, W., Wang, Y. R., Ehsani, K., Fox, D., & Krishna, R. (2024). Manipulate-Anything: Automating Real-World Robots using Vision-Language Models. Conference on Robot Learning (CoRL). https://arxiv.org/abs/2406.18915
Published in arXiv preprint, 2025
Action reasoning models with spatial reasoning capabilities for robotic manipulation tasks.
Recommended citation: Lee, J., Duan, J., Fang, H., Deng, Y., Liu, S., Li, B., Fang, B., Zhang, J., Wang, Y. R., Lee, S., Han, W., Pumacay, W., Wu, A., Hendrix, R., Farley, K., VanderBilt, E., Farhadi, A., Fox, D., & Krishna, R. (2025). MolmoAct: Action Reasoning Models that can Reason in Space. arXiv preprint. https://arxiv.org/abs/2508.07917
Published in In Submission, 2025
A framework for probing multimodal grounding through language-guided pointing interactions.
Recommended citation: Cheng, L., Duan, J., Wang, Y. R., Fang, H., Li, B., Huang, Y., Wang, E., Eftekhar, A., Lee, J., Yuan, W., Hendrix, R., Smith, N. A., Xia, F., Fox, D., & Krishna, R. (2025). PointArena: Probing Multimodal Grounding Through Language-Guided Pointing. In Submission. https://arxiv.org/abs/2505.09990
Published in International Conference on Machine Learning (ICML), 2025
Integration of visual foundation models with memory architecture for enhanced robotic manipulation capabilities.
Recommended citation: Fang, H., Grotz, M., Pumacay, W., Wang, Y. R., Fox, D., Krishna, R., & Duan, J. (2025). SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation. International Conference on Machine Learning (ICML). https://arxiv.org/abs/2501.18564
Published in International Conference on Learning Representations (ICLR), 2025
A vision-language model for detecting and reasoning over failures in robotic manipulation tasks.
Recommended citation: Duan, J., Pumacay, W., Kumar, N., Wang, Y. R., Tian, S., Yuan, W., Krishna, R., Fox, D., Mandlekar, A., & Guo, Y. (2025). AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2410.00371
Published in In Submission, 2025
A comprehensive evaluation framework for robotic manipulation tasks with structured and scalable assessment methods.
Recommended citation: Wang, Y. R., Ung, C., Tannert, G., Duan, J., Li, J., Le, A., Oswal, R., Grotz, M., Pumacay, W., Deng, Y., Krishna, R., Fox, D., & Srinivasa, S. (2025). RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation. In Submission. https://arxiv.org/abs/2507.00435
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.