Navigation auf uzh.ch
To apply, please send your CV, your Ms and Bs transcripts by email to all the contacts indicated below the project description. Do not apply on SiROP . Since Prof. Davide Scaramuzza is affiliated with ETH, there is no organizational overhead for ETH students. Custom projects are occasionally available. If you would like to do a project with us but could not find an advertized project that suits you, please contact Prof. Davide Scaramuzza directly to ask for a tailored project (sdavide at ifi.uzh.ch).
Upon successful completion of a project in our lab, students may also have the opportunity to get an internship at one of our numerous industrial and academic partners worldwide (e.g., NASA/JPL, University of Pennsylvania, UCLA, MIT, Stanford, ...).
Recent research has demonstrated significant success in integrating foundational models with robotic systems. In this project, we aim to investigate how these foundational models can enhance the vision-based navigation of UAVs. The drone will utilize learned semantic relationships from extensive world-scale data to actively explore and navigate through unfamiliar environments. While previous research primarily focused on ground-based robots, our project seeks to explore the potential of integrating foundational models with aerial robots to enhance agility and flexibility.
In this project, we are going to develop a vision-based reinforcement learning policy for drone navigation in dynamic environments. The policy should adapt to two potentially conflicting navigation objectives: maximizing the visibility of a visual object as a perceptual constraint and obstacle avoidance to ensure safe flight.
Reinforcement learning (RL) models devoid of explicit models have showcased remarkable superiority over classical planning and control strategies. This advantage is attributed to their advanced exploration capabilities, enabling them to efficiently discover new optimal trajectories. Leveraging RL, our aim is to create an autonomous racing system capable of swiftly learning optimal racing strategies and navigating tracks more effectively (faster) than traditional methods and human drivers.
Unwanted camera occlusions, such as debris, dust, raindrops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. This project aims to leverage the unique capabilities of event-based vision sensors to address the challenge of dynamic occlusions. By improving the reliability and accuracy of vision systems, this work could benefit a wide range of applications, from autonomous driving and drone navigation to environmental monitoring and augmented reality.
Low latency Occlusion-aware object tracking
Implicit scene representations, particularly Neural Radiance Fields (NeRF), have significantly advanced scene reconstruction and synthesis, surpassing traditional methods in creating photorealistic renderings from sparse images. However, the potential of integrating these methods with advanced sensor technologies that measure light at the granularity of a photon remains largely unexplored. These sensors, known for their exceptional low-light sensitivity and high dynamic range, could address the limitations of current NeRF implementations in challenging lighting conditions, offering a novel approach to neural-based scene reconstruction.
The first ever Mars helicopter Ingenuity flew over a texture-poor terrain and RANSAC wasn’t able to find inliers: https://spectrum.ieee.org/mars-helicopter-ingenuity-end-mission Navigating the Martian terrain poses significant challenges due to its unique and often featureless landscape, compounded by factors such as dust storms, lack of distinct textures, and extreme environmental conditions. The absence of prominent landmarks and the homogeneity of the surface can severely disrupt optical navigation systems, leading to decreased accuracy in localization and path planning.
IMU-centric Odometry for Drone Racing and Beyond
Gaussian Splatting Visual Odometry
This project focuses on developing robust reinforcement learning controllers for agile drone navigation using adaptive curricula. Commonly, these controllers are trained with a static, pre-defined curriculum. The goal is to develop a dynamic, adaptive curriculum that evolves online based on the agents' performance to increase the robustness of the controllers.
Model-based reinforcement learning (MBRL) methods have greatly improved sample efficiency compared to model-free approaches. Nonetheless, the amount of samples and compute required to train these methods remains too large for real-world training of robot control policies. Ideally, we should be able to leverage expert data (collected by human or artificial agents) to bootstrap MBRL. The exact way to leverage such data is yet unclear and many options are available. For instance, it is possible to only use such data for training high-accuracy dynamics models (world models) that are useful for multiple tasks. Alternatively, expert data can (also) be used for training the policy. Additionally, pretraining MBRL components can itself be very challenging as offline expert data is typically sampled from a very narrow distribution of behaviors, which makes finetuning non-trivial in out-of-distributions areas of the robot’s state-action space. In this thesis, you will look at different ways of incorporating expert data in MBRL and ideally propose new approaches to best do that. You will test these methods in both simulation (simulated drone, wheeled, legged) and in the real world on our quadrotor platform. You will gain insights into MBRL, sim-to-real transfer, robot control.
This project aims to simplify the learning process for new drone control tasks by leveraging a pre-existing library of skills through reinforcement learning (RL). The primary objective is to define a skill library that includes both established drone controllers and new ones learned from offline data (skill discovery). Instead of teaching a drone to fly from scratch for each new task, the project focuses on bootstrapping the learning process with these pre-existing skills. For instance, if a drone needs to search for objects in a room, it can utilize its already-acquired flying skills. A high-level policy will be trained to determine which low-level skill to deploy and how to parameterize it, thus streamlining the adaptation to new tasks. This approach promises to enhance efficiency and effectiveness in training drones for a variety of complex control tasks by building on foundational skills. In addition it facilitates training multi-task policies for drones.
Vision-based reinforcement learning (RL) is often sample-inefficient and computationally very expensive. One way to bootstrap the learning process is to leverage offline interaction data. However, this approach faces significant challenges, including out-of-distribution (OOD) generalization and neural network plasticity. The goal of this project is to explore methods for transferring offline policies to the online regime in a way that alleviates the OOD problem. By initially training the robot's policies system offline, the project seeks to leverage the knowledge of existing robot interaction data to bootstrap the learning of new policies. The focus is on overcoming domain shift problems and exploring innovative ways to fine-tune the model and policy using online interactions, effectively bridging the gap between offline and online learning. This advancement would enable us to efficiently leverage offline data (e.g. from human or expert agent demonstrations or previous experiments) for training vision-based robotic policies.
This project aims to develop and evaluate drone navigation policies using event-camera inputs, focusing on the challenges of transferring these policies from simulated environments to the real world. Event cameras, known for their high temporal resolution and dynamic range, offer unique advantages over traditional frame-based cameras, particularly in high-speed and low-light conditions. However, the sim-to-real gap—differences between simulated environments and the real world—poses significant challenges for the direct application of learned policies. In this project we will look try to understand the sim-to-real gap for event cameras and how this gap influences downstream control tasks, such as flying in the dark, dynamic obstacle avoidance and, object catching.
Vision-based reinforcement learning (RL) is more sample inefficient and more complex to train compared to state-based RL because the policy is learned directly from raw image pixels rather than from the robot state. In comparison to state-based RL, vision-based policies need to learn some form of visual perception or image understanding from scratch, which makes them way more complex to learn and to generalise. Foundation models trained on vast datasets have shown promising potential in outputting feature representations that are useful for a large variety of downstream tasks. In this project, we investigate the capabilities of such models to provide robust feature representations for learning control policies. We plan to study how different feature representations affect the exploration behavior of RL policies, the resulting sample complexity and the generalisation and robustness to out-of-distribution samples.
When drones are operated in industrial environments, they are often flown in close proximity to large structures, such as bridges, buildings or ballast tanks. In those applications, the interactions of the induced flow produced by the drone’s propellers with the surrounding structures are significant and pose challenges to the stability and control of the vehicle. A common methodology to measure the airflow is particle image velocimetry (PIV). Here, smoke and small particles suspended in the surrounding air are tracked to estimate the flow field. In this project, we aim to leverage the high temporal resolution of event cameras to perform smoke-PIV, overcoming the main limitation of frame-based cameras in PIV setups. Applicants should have a strong background in machine learning and programming with Python/C++. Experience in fluid mechanics is beneficial but not a hard requirement.
Recent progress in drone racing enables end-to-end vision-based drone racing, directly from images to control commands without explicit state estimation. In this project, we address the challenge of unforeseen obstacles and changes to the racing environment. The goal is to develop a control policy that can race through a predefined track but is robust to minor track layout changes and gate placement changes. Additionally, the policy should avoid obstacles that are placed on the racetrack, mimicking real-world applications where unforeseen obstacles can be present at any time.
Drone racing is considered a proxy task for many real-world applications, including search and rescue missions. In such an application, doorframes, corridors, and other features of the environment could be used to as “gates” the drone needs to pass through. Relevant information on the layout could be extracted from a floor plan of the environment in which the drone is tasked to operate autonomously. To be able to train such navigation policies, the first step is to simulate the environment. This project aims to develop a simulation of environments that procedurally generate corridors and doors based on an input floor plan. We will compare model-based approaches (placing objects according to some heuristic/rules) with learning-based approaches, which directly generate the model based on the floorplan.
Explore the use of large vision language models to control a drone.
Use Inverse Reinforcement Learning (IRL) to learn reward functions from previous expert drone demonstrations.
Explore online fine-tuning in the real world of sub-optimal policies.
In this project, you will investigate the use of event-based cameras for vision-based landing on celestial bodies such as Mars or the Moon.
The project aims to explore how prior 3D information can assist in reconstructing fine details in NeRFs and how the help of high-temporal resolution data can enhance modeling in the case of scene and camera motion.
This project seeks to leverage the sparse nature of events to accelerate the training of radiance fields.
Inspired by how humans learn, this project aims to explore the possibility of learning flight patterns, obstacle avoidance, and navigation strategies by simply watching drone flight videos available on YouTube.
This project focuses on enhancing camera pose estimation by exploring a data-driven approach to keypoint extraction, leveraging recent advancements in frame-based keypoint extraction techniques.
The goal of this project is to develop a shared embedding space for events and frames, enabling the training of a motor policy on simulated frames and deployment on real-world event data.
In this project, the student applies concepts from current advances in image generation to create artificial events from standard frames. Multiple state-of-the-art deep learning methods will be explored in the scope of this project.
Master's thesis in collaboration with SynSense: Neuromorphic Intelligence & Application Solutions
This project focuses on combining Large Language Models within the area of Event-based Computer Vision.
This project explores a novel approach to graph embeddings using electrical flow computations.