ESR3 – Classifying and Predicting Interactions Between AV and VRUs Using AI

Popular scientific abstract

Given that human error is involved in about 95% of all road traffic accidents, automated vehicles can greatly reduce these figures and improve road safety. More than half of all road traffic deaths and injuries involve vulnerable road users (VRUs), such as pedestrians, cyclists, and motorcyclists. Therefore, it’s essential for automated driving (AD) systems to predict the behavior and intention of VRUs to help AVs to make better decisions and to prevent hazardous situations. In ESR3-“Classifying and Predicting Interactions Between AV and VRUs Using AI”, we aim to better understand how VRUs behave when interacting with AVs in urban traffic, in order to improve human-AV interaction design. In this project, we’ll use powerful AI tools, such as machine learning and deep learning methods for the prediction of VRU behavior. With the help of various sensors, AVs can better perceive the VRUs. We’ll collect real-world driving data in Europe to train and test our algorithms. The classification and prediction of VRU behavior will be used to better understand the interactions of external road user with AVs, and to improve road safety.

My affiliation

My host university is University of Gothenburg.

Contact of supervisors:

Dr Christian Berger (UGOT; christian.berger@gu.se)

Prof Marco Dozza (Chalmers; marco.dozza@chalmers.se)

Background

The prediction of VRUs such as pedestrians can be very challenging. This is because pedestrians are very agile that can change both their direction and velocity without reducing the speed [[i],[ii]], and are easily influenced by other road participants’ behavior and surroundings [2]. It is difficult to reliably predict intentions of pedestrians by hand-crafted features; hence, we aim to design a deep learning method that can encode the feature directly from raw sensor data and can deal with complicated traffic environment. The camera is often a preferred sensor to predict the pedestrians’ intention [2]. However, cameras usually have a very limited field of view and lack depth information. On contrast, 3D data such as point cloud data collected by Light Detection and Ranging (LiDAR) can provide depth information as well as rich geometric and shape information [[iii]], which can provide more information of complicated traffic conditions. Unfortunately, the high sparseness and irregular of point cloud data generated by LiDAR make it not easy to apply deep learning methods such as Convolutional Neural Network (CNN) which is commonly used by 2D detections. It’s a great challenge to involve the LiDAR information in the prediction of VRUs’ intention.

Aims and objectives

This research aims at better understanding how VRUs behave when interacting with AVs in urban traffic in order to improve human-AV interaction design. AI methods will be applied to AV/VRU interaction data to classify the interactions and predict VRU behaviors. The classifications and predictions of VRU behavior will be used to improve understanding of AV-external road user behavior in interactions with AVs.

The objectives of this research are:

  • To develop AI tools to classify and predict the behaviors of VRUs in interactions with cars;
  • To use these tools to predict VRU behavior and intention in more complex interactions;
  • To compare the behavior classifications and predictions using AI with results from literature and others traditional computational models.

Research description

As a part of the “SHAPE-IT” project, my research topic is “Classifying and Predicting Interactions Between AVs and VRUs Using AI”, which aims at using AI to better understand human behaviors.

My research plan for this project is as below:

The entire process about VRU behavior prediction usually includes: (a) obstacle detection (to detect obstacles from raw sensor data), (b) tracking (associate the same obstacles of different timestamp), and (c) prediction (classify VRUs’ intentions and predict future trajectories). We are striving at developing an end-to-end method to directly predict trajectories and intentions of pedestrians instead of solving several separate problems, because the cascade approaches restrict the information that the behavior prediction module has access to, which may lead to sub-optimal solutions. There are several kinds of sensors on a self-driving vehicle, e.g. camera, LiDAR, and radar. We plan to use multi-sensors, as they can provide more information than only one sensor so that can be more accurate. To achieve the goal, we plan to take several steps for the next coming years:

1) Using one sensor, e.g. 3D LiDAR, to design and develop the ML methods to predict the behaviors of pedestrians. The inputs are manually labeled pedestrians’ history trajectories, as well as other information from the raw sensor data, and the outputs are trajectories and intentions of pedestrians in the future.

2) Using one sensor, e.g. 3D LiDAR, to develop the entire end-to-end prediction process, including detection, tracking, and prediction. The inputs are only raw single-sensor data, and the outputs are trajectories and intentions of pedestrians in the future.

3) Fusing multi-sensor information, e.g. 3D LiDAR and 2D cameras, to develop the entire end-to-end prediction process, including detection, tracking, and prediction. The inputs are raw data from multiple sensors, and the outputs are trajectories and intentions of pedestrians in the future.

4) Including not only the information of the pedestrians and environments, but also the interaction with the ego-vehicle and the VRUs.

Some other work which can be included:

1) Data collection, analysis, and processing. To develop and improve the algorithm of VRU understanding, we need to understand data better. There are also some work about the data collection, analysis, and processing which need to be done, e.g: how data affects the model performance. 2) Deployment on test vehicle. To verify our algorithms, we need to deploy the model online on a test vehicle. It’s also a part of work need to be done, e.g.: how to make sure the program is reliable and stable on the test vehicle.

Results

The expected results are:

  • Tools for classifying and predicting interaction behaviors between AVs and VRUs in urban environments, to support human factors researchers and automation designers;
  • Requirements and guidelines for the use of AI in classifying and predicting VRU interactions with AVs;

Assessment/validation of the AI tool to classify and predict AV/VRU interactions (compared to traditional methods)

My publications

Papers under submission:

  • Poster: Chi Zhang, Christian Berger and Marco Dozza. An End-to-End Network for the Prediction of Pedestrian Behavior from LiDAR Data.       L3Pilot Summer School (Athens, Greece, 2020 Sept.). Accepted.
  • Workshop: Chi Zhang, Christian Berger and Marco Dozza. Towards Understanding Pedestrian Behavior Patterns from LiDAR Data. SAIS workshop (Online, 2020 June). Accepted.

References and links

[[1]] B. Volz, K. Behrendt, H. Mielenz, I. Gilitschenski, R. Siegwart, and J. Nieto. A data-driven approach for pedestrian intention estimation. In 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pages 2607–2612. IEEE, 2016.

[[1]] B. Volz, H. Mielenz, G. Agamennoni, and R. Siegwart. Feature relevance estimation for learning pedestrian behavior at crosswalks. In 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pages 854–860. IEEE, 2015. [[1]] Y. Guo, H.Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun. Deep learning for 3d point clouds: A survey. arXiv preprint arXiv:1912.12033, 2019.