End-to-end Autonomous Driving Research: status quo of End-to-end (E2E) autonomous driving
1. Status quo of end-to-end solutions in China
An end-to-end autonomous driving system refers to direct mapping from sensor data inputs (camera images, LiDAR, etc.) to control command outputs (steering, acceleration/deceleration, etc.). It first appeared in the ALVINN project in 1988. It uses cameras and laser rangefinders as input and a simple neural network to generate steering as output.
In early 2024, Tesla rolled out FSD V12.3, featuring an amazing intelligent driving level. The end-to-end autonomous driving solution garners widespread attention from OEMs and autonomous driving solution companies in China.?
Compared with conventional multi-module solutions, the end-to-end autonomous driving solution integrates perception, prediction and planning into a single model, simplifying the solution structure. It can simulate human drivers making driving decisions directly according to visual inputs, effectively cope with long tail scenarios of modular solutions and improve the training efficiency and performance of models.
Li Auto's end-to-end solution
Li Auto believes that a complete end-to-end model should cover the whole process of perception, tracking, prediction, decision and planning, and it is the optimal solution to achieve L3 autonomous driving. In 2023, Li Auto pushed AD Max3.0, with overall framework reflecting the end-to-end concept but still a gap with a complete end-to-end solution. In 2024, Li Auto is expected to promote the system to become a complete end-to-end solution.?
Li Auto's autonomous driving framework is shown below, consisting of two systems:
Fast system: System 1, Li Auto’s existing end-to-end solution which is directly executed after perceiving the surroundings.
Slow system: System 2, a multimodal large language model that logically thinks and explores unknown environments to solve problems in unknown L4 scenarios.
In the process of promoting the end-to-end solution, Li Auto plans to unify the planning/forecast model and the perception model, and accomplish the end-to-end Temporal Planner on the original basis to integrate parking with driving.
2. Data becomes the key to the implementation of end-to-end solutions.
The implementation of an end-to-end solution requires processes covering R&D team building, hardware facilities, data collection and processing, algorithm training and strategy customization, verification and evaluation, promotion and mass production. Some of the sore points in scenarios are as shown in the table:
The integrated training in end-to-end autonomous driving solutions requires massive data, so one of the difficulties it faces lies in data collection and processing.
First of all, it needs a long time and may channels to collect data, including driving data and scenario data such as roads, weather and traffic conditions. In actual driving, the data within the driver's front view is relatively easy to collect, but the surrounding information is hard to say.
During data processing, it is necessary to design data extraction dimensions, extract effective features from massive video clips, make statistics of data distribution, etc. to support large-scale data training.
DeepRoute
As of March 2024, DeepRoute.ai's end-to-end autonomous driving solution has been designated by Great Wall Motor and involved in the cooperation with NVIDIA. It is expected to adapt to NVIDIA Thor in 2025. In the planning of DeepRoute.ai, the transition from the conventional solution to the "end-to-end" autonomous driving solution will go through sensor pre-fusion, HD map removal, and integration of perception, decision and control.
GigaStudio
DriveDreamer, an autonomous driving model of GigaStudio, is capable of scenario generation, data generation, driving action prediction and so forth. In the scenario/data generation, it has two steps:
When involving single-frame structural conditions, guide DriveDreamer to generate driving scenario images, so that it can understand structural traffic constraints easily.
Extend its understanding to video generation. Using continuous traffic structure conditions, DriveDreamer outputs driving scene videos to further enhance its understanding of motion transformation.
3. End-to-end solutions accelerate the application of embodied robots.
In addition to autonomous vehicles, embodied robots are another mainstream scenario of end-to-end solutions. From end-to-end autonomous driving to robots, it is necessary to build a more universal world model to adapt to more complex and diverse real application scenarios. The development framework of mainstream AGI (General Artificial Intelligence) is divided into two stages:
Stage 1: the understanding and generation of basic foundation models are unified, and further combined with embodied artificial intelligence (embodied AI) to form a unified world model;
Stage 2: capabilities of world model + complex task planning and control, and abstract concept induction gradually evolve into the era of the interactive AGI 1.0.
In the landing process of the world model, the construction of an end-to-end VLA (Vision-Language-Action) autonomous system has become a crucial link. VLA, as the basic foundation model of embodied AI, can seamlessly link 3D perception, reasoning and action to form a generative world model, which is built on the 3D-based large language model (LLM) and introduces a set of interactive markers to interact with the environment.
As of April 2024, some manufacturers of humanoid robots adopting end-to-end solutions are as follows:
For example, Udeer·AI's Large Physical Language Model (LPLM) is an end-to-end embodied AI solution that uses a self-labeling mechanism to improve the learning efficiency and quality of the model from unlabeled data, thereby deepening the understanding of the world and enhancing the robot's generalization capabilities and environmental adaptability in cross-modal, cross-scene, and cross-industry scenarios.
LPLM abstracts the physical world and ensures that this kind of information is aligned with the abstract level of features in LLM. It explicitly models each entity in the physical world as a token, and encodes geometric, semantic, kinematic and intentional information.
In addition, LPLM adds 3D grounding to the encoding of natural language instructions, improving the accuracy of natural language to some extent. Its decoder can learn by constantly predicting the future, thus strengthening the ability of the model to learn from massive unlabeled data.
End-to-end Autonomous Driving Industry Report, 2024-2025
End-to-end intelligent driving research: How Li Auto becomes a leader from an intelligent driving follower
There are two types of end-to-end autonomous driving: global (one-stage) and segmented (two-...
China Smart Door and Electric Tailgate Market Research Report, 2024
Smart door research: The market is worth nearly RMB50 billion in 2024, with diverse door opening technologies
This report analyzes and studies the installation, market size, competitive landsc...
Commercial Vehicle Intelligent Chassis Industry Report, 2024
Commercial vehicle intelligent chassis research: 20+ OEMs deploy chassis-by-wire, and electromechanical brake (EMB) policies are expected to be implemented in 2025-2026
The Commercial Vehicle Intell...
Automotive Smart Surface Industry Report, 2024
Research on automotive smart surface: "Plastic material + touch solution" has become mainstream, and sales of smart surface models soared by 105.1% year on year
In this report, smart surface refers t...
China Automotive Multimodal Interaction Development Research Report, 2024
Multimodal interaction research: AI foundation models deeply integrate into the cockpit, helping perceptual intelligence evolve into cognitive intelligence
China Automotive Multimodal Interaction Dev...
Automotive Vision Industry Report, 2024
Automotive Vision Research: 90 million cameras are installed annually, and vision-only solutions lower the threshold for intelligent driving. The cameras installed in new vehicles in China will hit 90...
Automotive Millimeter-wave (MMW) Radar Industry Report, 2024
Radar research: the pace of mass-producing 4D imaging radars quickens, and the rise of domestic suppliers speeds up.
At present, high-level intelligent driving systems represented by urban NOA are fa...
Chinese Independent OEMs’ ADAS and Autonomous Driving Report, 2024
OEM ADAS research: adjust structure, integrate teams, and compete in D2D, all for a leadership in intelligent driving
In recent years, China's intelligent driving market has experienced escala...
Research Report on Overseas Layout of Chinese Passenger Car OEMs and Supply Chain Companies, 2024
Research on overseas layout of OEMs: There are sharp differences among regions. The average unit price of exports to Europe is 3.7 times that to Southeast Asia.
The Research Report on Overseas Layou...
In-vehicle Payment and ETC Market Research Report, 2024
Research on in-vehicle payment and ETC: analysis on three major application scenarios of in-vehicle payment
In-vehicle payment refers to users selecting and purchasing goods or services in the car an...
Automotive Audio System Industry Report, 2024
Automotive audio systems in 2024: intensified stacking, and involution on number of hardware and software tuning
Sales of vehicle models equipped with more than 8 speakers have made stea...
China Passenger Car Highway & Urban NOA (Navigate on Autopilot) Research Report, 2024
NOA industry research: seven trends in the development of passenger car NOA
In recent years, the development path of autonomous driving technology has gradually become clear, and the industry is acce...
Automotive Cloud Service Platform Industry Report, 2024
Automotive cloud services: AI foundation model and NOA expand cloud demand, deep integration of cloud platform tool chainIn 2024, as the penetration rate of intelligent connected vehicles continues to...
OEMs’ Passenger Car Model Planning Research Report, 2024-2025
Model Planning Research in 2025: SUVs dominate the new lineup, and hybrid technology becomes the new focus of OEMs
OEMs’ Passenger Car Model Planning Research Report, 2024-2025 focuses on the medium ...
Passenger Car Intelligent Chassis Controller and Chassis Domain Controller Research Report, 2024
Chassis controller research: More advanced chassis functions are available in cars, dozens of financing cases occur in one year, and chassis intelligence has a bright future. The report combs th...
New Energy Vehicle Thermal Management System Market Research Report, 2024
xEV thermal management research: develop towards multi-port valve + heat pump + liquid cooling integrated thermal management systems.
The thermal management system of new energy vehicles evolves fro...
New Energy Vehicle Electric Drive and Power Domain industry Report, 2024
OEMs lead the integrated development of "3 + 3 + X platform", and the self-production rate continues to increase
The electric drive system is developing around technical directions of high integratio...
Global and China Automotive Smart Glass Research Report, 2024
Research on automotive smart glass: How does glass intelligence evolve
ResearchInChina has released the Automotive Smart Glass Research Report 2024. The report details the latest advances in di...