Demo Videos

Real-Time Semantic Mapping via YOLOv8
YOLOv8 SLAM Toolbox Nav2

This phase focused on achieving low-latency perception. By integrating fine-tuned YOLOv8 with the SLAM Toolbox, I developed a pipeline that projects 2D object detections into a 3D spatial occupancy grid. This allows the robot to "see" and remember the physical location of objects in real-time without compromising navigation speed.

Advanced VLM Integration & Semantic Context
VLM Spatial Mapping JSON Database

Moving beyond simple bounding boxes, I utilized a small VLM to extract rich, high-fidelity descriptions of the environment and bounding box on object. This integration allows the robot to understand visual context and identify unique object attributes—such as color, texture, and state—creating a "meaning-aware" semantic map rather than just a geometric occupancy grid.

To ensure the system remained performant for active movement, I finalized the pipeline by integrating YOLOv8. This allowed the robot to use high-speed tracking and real-time navigation updates.

Sequential Task Planning & Goal Sequencing
GPT-OSS:20b Qwen2-VL Ollama

Demonstrating the "Robot Brain" in action: The robot receives complex, natural language commands like "Go to the Android toy, go to Mario and the Asus box." Using GPT-OSS:20b, the system deconstructs these sentences into logical navigation goals.

Contextual Reasoning
Jan-2025 Zero-Shot Classification Contextual Memory Autonomous Navigation

This final demo showcases the integration of Long-Term Contextual Memory with spatial reasoning. When the user asks to identify a "wild animal," the robot performs zero-shot classification to identify a toy lion.

Crucially, when the user follows up with "Which object is close to it?", the robot maintains the conversation context—understanding that "it" refers to the lion identified previously. It then queries its JSON spatial database to calculate the nearest neighbor (the yellow traffic cone) and autonomously plans a path to that coordinate via Nav2, even without the user repeating the object's name.