Ömer Erdinç Yağmurlu
I am Ömer Erdinç Yağmurlu, a master’s student in Computer Science at KIT, passionate about machine learning and robotics. Currently, I am a student researcher at the Intuitive Robots Lab, working on intuitive, embodied AI, imitation learning, real robot hardware, and 3D vision. Previously, I worked at TECO on the edge-ml.org project.
- Github •
- Google Scholar
Publications
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
Robotics: Science and Systems (RSS), 2024

The Multimodal Diffusion Transformer (MDT) is a novel framework that learns versatile behaviors from multimodal goals with minimal language annotations. Leveraging a transformer backbone, MDT aligns image and language-based goal embeddings through two self-supervised objectives, enabling it to tackle long-horizon manipulation tasks. In benchmark tests like CALVIN and LIBERO, MDT outperforms prior methods by 15% while using fewer parameters. Its effectiveness is demonstrated in both simulated and real-world environments, highlighting its potential in settings with sparse language data.
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Conference on Robot Learning (CoRL), 2024

Using pre-trained vision-language models, NILS detects objects, identifies changes, segments tasks, and annotates behavior datasets. Evaluations on the BridgeV2 and kitchen play datasets demonstrate its effectiveness in annotating diverse, unstructured robot demonstrations while addressing the limitations of traditional human labeling methods.
Projects
Popcorn

AlgorivA
