Research Report on AI Foundation Models and Their Applications in Automotive Field, 2024-2025
  • Feb.2025
  • Hard Copy
  • USD $4,500
  • Pages:340
  • Single User License
    (PDF Unprintable)       
  • USD $4,300
  • Code: GX016
  • Enterprise-wide License
    (PDF Printable & Editable)       
  • USD $6,400
  • Hard Copy + Single User License
  • USD $4,700
      

Research on AI foundation models and automotive applications: reasoning, cost reduction, and explainability

Reasoning capabilities drive up the performance of foundation models.

Since the second half of 2024, foundation model companies inside and outside China have launched their reasoning models, and enhanced the ability of foundation models to handle complex tasks and make decisions independently by using reasoning frameworks like Chain-of-Thought (CoT).   

The intensive releases of reasoning models aim to enhance the ability of foundation models to handle complex scenarios and lay the foundation for Agent application. In the automotive industry, improved reasoning capabilities of foundation models can address sore points in AI applications, for example, enhancing the intent recognition of cockpit assistants in complex semantics and improving the accuracy of spatiotemporal prediction in autonomous driving planning and decision.

In 2024, reasoning technologies of mainstream foundation models introduced in vehicles primarily revolved around CoT and its variants (e.g., Tree-of-Thought (ToT), Graph-of-Thought (GoT), Forest-of-Thought (FoT)), and combined with generative models (e.g., diffusion models), knowledge graphs, causal reasoning models, cumulative reasoning, and multimodal reasoning chains in different scenarios.

For example, the Modularized Thinking Language Model (MeTHanol) proposed by Geely allows foundation models to synthesize human thoughts to supervise the hidden layers of LLMs, and generates human-like thinking behaviors, enhances the thinking and reasoning capabilities of large language models, and improves explainability, by adapting to daily conversations and personalized prompts.

In 2025, the focus of reasoning technology will shift to multimodal reasoning. Common training technologies include instruction fine-tuning, multimodal context learning, and multimodal CoT (M-CoT), and are often enabled by combining multimodal fusion alignment and LLM reasoning technologies. 

Explainability bridges trust between AI and users.

Before users experience the "usefulness" of AI, they need to trust it. In 2025, the explainability of AI systems therefore becomes a key factor in increasing the user base of automotive AI. This challenge can be addressed by demonstrating long CoT.

The explainability of AI systems can be achieved at three levels: data explainability, model explainability, and post-hoc explainability.

In Li Auto's case, its L3 autonomous driving uses "AI reasoning visualization technology" to intuitively present the thinking process of end-to-end + VLM models, covering the entire process from physical world perception input to driving decision outputted by the foundation model, enhancing users’ trust in intelligent driving systems.

In Li Auto's "AI reasoning visualization technology":
?Attention system displays traffic and environmental information perceived by the vehicle, evaluates the behavior of traffic participants in real-time video streams and uses heatmaps to display evaluated objects.
?End-to-end (E2E) model displays the thinking process behind driving trajectory output. The model thinks about different driving trajectories, presents 10 candidate output results, and finally adopts the most likely output result as the driving path.
?Vision language model (VLM) displays its perception, reasoning, and decision-making processes through dialogue.

Various reasoning models’ dialogue interfaces also employ a long CoT to break down the reasoning process as well. Examples include DeepSeek R1 which during conversations with users, first presents the decision at each node through a CoT and then provides explanations in natural language. 

Additionally, most reasoning models, including Zhipu’s GLM-Zero-Preview, Alibaba’s QwQ-32B-Preview, and Skywork 4.0 o1, support demonstration of the long CoT reasoning process.
DeepSeek lowers the barrier to introduction of foundation models in vehicles, enabling both performance improvement and cost reduction.

Does the improvement in reasoning capabilities and overall performance mean higher costs? Not necessarily, as seen with DeepSeek's popularity. In early 2025, OEMs have started connecting to DeepSeek, primarily to enhance the comprehensive capabilities of vehicle foundation models as seen in specific applications.

In fact, before DeepSeek models were launched, OEMs had already been developing and iterating their automotive AI foundation models. In the case of cockpit assistant, some of them had completed the initial construction of cockpit assistant solutions, and connected to cloud foundation model suppliers for trial operation or initially determined suppliers, including cloud service providers like Alibaba Cloud, Tencent Cloud, and Zhipu. They connected to DeepSeek in early 2025, valuing the following: 
Strong reasoning performance: for example, the R1 reasoning model is comparable to OpenAI o1, and even excels in mathematical logic.
Lower costs: maintain performance while keeping training and reasoning costs at low levels in the industry.

By connecting to DeepSeek, OEMs can really reduce the costs of hardware procurement, model training, and maintenance, and also maintain performance, when deploying intelligent driving and cockpit assistants:
Low computing overhead technologies facilitate high-level autonomous driving and technological equality, which means high performance models can be deployed on low-compute automotive chips (e.g., edge computing unit), reducing reliance on expensive GPUs. Combined with DualPipe algorithm and FP8 mixed precision training, these technologies optimize computing power utilization, allowing mid- and low-end vehicles to deploy high-level cockpit and autonomous driving features, accelerating the popularization of intelligent cockpits.  
Enhance real-time performance. In driving environments, autonomous driving systems need to process large amounts of sensor data in real time, and cockpit assistants need to respond quickly to user commands, while vehicle computing resources are limited. With lower computing overhead, DeepSeek enables faster processing of sensor data, more efficient use of computing power of intelligent driving chips (DeepSeek realizes 90% utilization of NVIDIA A100 chips during server-side training), and lower latency (e.g., on the Qualcomm 8650 platform, with computing power of 100TOPS, DeepSeek reduces the inference response time from 20 milliseconds to 9-10 milliseconds). In intelligent driving systems, it can ensure that driving decisions are timely and accurate, improving driving safety and user experience. In cockpit systems, it helps cockpit assistants to quickly respond to user voice commands, achieving smooth human-computer interaction.  

Definitions 
1 Overview of AI Foundation Models
1.1 Introduction to AI Models 
Definition and Features of AI Models
Classification of AI Models by Architecture
Classification of AI Models by Task Type/Training Method
Classification of AI Models by Supervision Mode
Classification of AI Models by Modality 
Application Process of AI Models
1.2 Introduction to Foundation Models
Classification of Foundation Models
Current Development of Foundation Models in Automotive Industry
Application Scenarios of Foundation Models in Automotive Industry
Application Case 1: Application of LLM in Autonomous Driving
Application Case 2: Application of VFM in Autonomous Driving
Application Case 3: Application of MFM in Autonomous Driving 

2 Analysis of AI Foundation Models of Differing Types
2.1 Large Language Models (LLM) 
Development History of LLM
Key Capabilities of LLM
Cases of Integration with Other Models
2.2 Multimodal Large Language Models (MLLM)
Development and Overview of Large Multimodal Models
Large Multimodal Models VS. Large Single-modal Models (1)
Large Multimodal Models VS. Large Single-modal Models (2) 
Technology Panorama of Large Multimodal Models 
Multimodal Information Representation
Multimodal Large Language Models (MLLM)
Architecture and Core Components of MLLM
Status Quo of MLLM
Dataset Evaluation by Different MLLM Representatives
Reasoning Capabilities of MLLM
Synergy between MLLM and Agent
Application Case 1: Application of MLLM in VQA
Application Case 2: Application of MLLM in Autonomous Driving 
2.3 Vision-Language Models (VLM) and Vision-Language-Action (VLA) Models 
Development History of VLM
Application of VLM
Architecture of VLM
Evolution of VLM in Intelligent Driving
Application Scenarios of VLM: End-to-end Autonomous Driving
Application Scenarios of VLM: Combination with Gaussian Framework
VLM→VLA
VLA Models
Principles of VLA
Classification of VLA Models
Application Cases of VLA (1)
Application Cases of VLA (2)
Application Cases of VLA (3) 
Application Cases of VLA (4)
Case 1: Core Functions of?End-to-End?Multimodal Model?for?Autonomous Driving (EMMA)
Case 2: World Model Construction
Case 3: Improve Vision-Language Navigation Capabilities
Case 4: VLA Generalization Enhancement
Case 5: Computing Overhead of VLA
2.4 World Models
Key Definitions of World Models and Application Development 
Basic Architecture of World Models
Framework Setup and Implementation Challenges of World Models
Video Generation Methods Based on Transformer and Diffusion Models
Technical Principle and Path of WorldDreamer
World Models and End-to-end Intelligent Driving
World Models and End-to-end Intelligent Driving: Data Generation
Case 1: Tesla World Model
Case 2: NVIDIA
Case 3: InfinityDrive
Case 4: Worlds Labs Spatial Intelligence
Case 5: NIO
Case 6: 1X's "World Model"

3 Common Technologies in AI Foundation Models
Common Foundation Model Algorithms and Architectures
Comparison of Features and Application Scenarios between Foundation Model Algorithms
3.1 Foundation Model Architectures and Related Algorithms 
Transformer: Architecture and Features
Transformer: Algorithm Mechanisms
Transformer: Multi-head Attention Mechanisms and Their Variants
KAN: Potential to Replace MLP
KAN: Cases of Integration with Transformer Architecture
MAMBA: Introduction
MAMBA: Architectural Foundations
MAMBA: Latest Developments
MAMBA: Application Scenarios
MAMBA: Cases of Integration with Transformer Architecture
Applicability of CNN in the Era of Foundation Models
Applicability of RNN Variants in the Era of Foundation Models
3.2 Visual Processing Algorithms
Common Vision Algorithms
ViT
CLIP Scenarios and Features 
CLIP Workflow
LLaVA Model
3.3 Training and Fine-Tuning Technologies 
Foundation Model Training Process
Training Case: Geely's CPT Enhancement Solution
Instruction Fine-tuning
Training Case: Geely's Fine-tuning Framework for Multi-round Dialogues
3.4 Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement Learning Process
Comparison between Some Reinforcement Learning Technology Routes 
Cases of Reinforcement Learning (1)-(3)
3.5 Knowledge Graphs
Optimization Directions for Retrieval-Augmented Generation (RAG)
Evolution Directions of RAG (1): KAG
Evolution Directions of RAG (2): CAG
Evolution Directions of RAG (3): GraphRAG 
RAG Application Case 1: 
RAG Application Case 2:
RAG Application Case 3: Li Auto
RAG Application Case 4: Geely
Comparison between RAG Routes
Function Call
3.6 Reasoning Technologies 
Reasoning Process of Transformer Models
Evaluation of Reasoning Capabilities
Three Optimization Directions for Foundation Model Reasoning
Reasoning Task Types (1)
Reasoning Task Types (2)
Reasoning Task Types (3)
Common Reasoning Algorithm 1: CoT
Common Reasoning Algorithm 2: GoT/ToT
Comparison between Common Reasoning Algorithms
Common Reasoning Algorithm 3: PagedAttention
Reasoning Case 1: Geely
Reasoning Case 2: NVIDIA
3.7 Sparsification? 
Characteristics of MoE Architecture
Principles of MoE Architecture
MoE Training Strategies
Advantages and Challenges of MoE
MoE Models from Different Foundation Model Companies
Evolution Direction of MoE
3.8 Generation Technologies 
Introduction to Generative Models
Comparison between Generation Technologies
Case 1: Li Auto
Case 2: XPeng
Case 3: SAIC

4 AI Foundation Model Companies 
Development History of Mainstream Foundation Models
Mainstream Foundation Models and Their Companies (Foreign)
Mainstream Foundation Models and Their Companies (Chinese)
Rankings of Evaluated Foundation Models
4.1 OpenAI
Product Layout 
Product Iteration History
GPT Series: Features 
GPT Series: Architecture
From GPT-4V to 4o
Reasoning Model OpenAI o1
SORA: Features
SORA: Performance Evaluation
SORA: Advantages and Limitations
4.2 Google
Development History of Foundation Models 
Typical Model BERT: Architecture
Typical Model BERT: Variants
Gemini Model
Cases of Foundation Models in the Automotive Industry
4.3 Meta
LLAMA3.3
LLAMA Series: Evolution
LLAMA Series: Features
LLAMA Series: Training Methods 
LLAMA Series: Alpaca
LLAMA Series: Vicuna
4.4 Anthropic
Claude Performance Evaluation  
Claude-based PC-side Agent 
4.5 Mistral AI
Expert Model: Architecture
Expert Model: Algorithm Features (1)
Expert Model: Algorithm Features (2) 
Large Language Model: Mistral Large 2
4.6 Amazon
Nova Product System
Application Cases of Amazon AI Cloud in the Automotive Industry (1)-(3)
4.7 Stability AI
Product System
Stable Diffusion Architecture Based on Diffusion Models
Comparison between Stable Diffusion Video Generation Technology with Competitors
4.8 xAI
Product System
Capabilities of xAI Models
Capabilities of Grok-2 
Capabilities of Grok-0/1 
4.9 Abu Dhabi Technology Innovation Institute
Iteration History of Falcon Model Series
Parameters of Falcon 3 Series
Evaluation of Falcon 3 Series
4.10 SenseTime
Major Foundation Model Product Systems
Major Foundation Model Product Systems
Foundation Model Training Facilities
Functional Scenarios of Foundation Models
Foundation Model Technologies
4.11 Alibaba Cloud
Foundation Model Product System
End-cloud Integration Solutions of Foundation Models 
4.12 Baidu AI?Cloud 
Foundation Model Product System 
4.13 Tencent Cloud
Foundation Model Product System
Reasoning Service Solutions (1)-(3)
Generation Scenario Solutions for Foundation Models
Q&A Scenario Solutions for Foundation Models 
4.14 ByteDance & Volcano Engine
Doubao Model System
Functional Highlights of Volcano Engine's Cockpit 
4.15 Huawei
Pangu Model Product System
Application Cases of Pangu Models in Data Synthesis? 
LLM Architecture of Pangu Models
Capabilities of Pangu Models: Multimodal Technology
Capabilities of Pangu Models: Thinking & Reasoning Technology
AI Cloud Services of Pangu Models
4.16 Zhipu AI
Product System
Foundation Model Base in the Automotive Industry
Technical Features  
4.17 Flytek
Product System
Functional and Technical Highlights 
Cockpit AI System
4.18 DeepSeek
Product System 
Technical Inspiration from DeepSeek V3 
Technical Highlights of DeepSeek R1
Application Cases of DeepSeek (1)-(3)

5 Application Cases of AI Foundation Models in Automotive 
5.1 Cockpit Cases 
Lenovo's AI Vehicle Computing Framework Used in Cockpits
In-cabin Functions of Thundersoft's Rubik Foundation Model
LLM Empowers Smart Eye’s DMS/OMS Assistance System 
Application of DIT in Voice Processing Scenarios
Application of Unisound's Shanhai Model in Cockpits 
Phoenix Auto Intelligence’s Cockpit Smart Brain 
5.2 Intelligent Driving Cases 
Li Auto: Multimodal Technology in Autonomous Driving (1)
Li Auto: Multimodal Technology in Autonomous Driving (2)
Li Auto: Multimodal Technology in Autonomous Driving (3): Overcoming 2D Limitations
Li Auto: Data Generation Technology (1)
Li Auto: Data Generation Technology (2)
Li Auto: CoT Technology in DriveVLM
Li Auto: Application of Visual Processing
Li Auto: Data Selection
Geely: Application of Visual Processing
Geely: Multimodal Learning Framework
Waymo: Generative World Model GAIA-1
Tesla: Algorithm Architecture (Including NeRF)
Tesla: Skeleton, Neck, and Head of Vision Algorithms
Tesla: Core of Visual System - HydraNet
Giga’s World Model


6 Application Trends of AI Foundation Models
6.1 Data
Trend 1:
Trend 2:
6.2 Algorithm
Trend 1:
Trend 2:
Trend 3
Trend 4:
6.3 Computing Power
Trend 1:
Trend 2:
6.4 Engineering
Trend 1
Trend 2

Research Report on Overseas Cockpit Configuration and Supply Chain of Key Models, 2025

Overseas Cockpit Research: Tariffs stir up the global automotive market, and intelligent cockpits promote automobile exports ResearchInChina has released the Research Report on Overseas Cockpit Co...

Automotive Display, Center Console and Cluster Industry Report, 2025

In addition to cockpit interaction, automotive display is another important carrier of the intelligent cockpit. In recent years, the intelligence level of cockpits has continued to improve, and automo...

Vehicle Functional Safety and Safety Of The Intended Functionality (SOTIF) Research Report, 2025

Functional safety research: under the "equal rights for intelligent driving", safety of the intended functionality (SOTIF) design is crucial As Chinese new energy vehicle manufacturers propose "Equal...

Chinese OEMs’ AI-Defined Vehicle Strategy Research Report, 2025

AI-Defined Vehicle Report: How AI Reshapes Vehicle Intelligence? Chinese OEMs’ AI-Defined Vehicle Strategy Research Report, 2025, released by ResearchInChina, studies, analyzes, and summarizes the c...

Automotive Digital Key (UWB, NearLink, and BLE 6.0) Industry Trend Report, 2025

Digital key research: which will dominate digital keys, growing UWB, emerging NearLink or promising Bluetooth 6.0?ResearchInChina has analyzed and predicted the digital key market, communication techn...

Integrated Battery (CTP, CTB, CTC, and CTV) and Battery Innovation Technology Report, 2025

Power battery research: 17 vehicle models use integrated batteries, and 34 battery innovation technologies are released ResearchInChina released Integrated Battery (CTP, CTB, CTC, and CTV)and Battery...

AI/AR Glasses Industry Research Report, 2025

ResearchInChina released the " AI/AR Glasses Industry Research Report, 2025", which deeply explores the field of AI smart glasses, sorts out product R&D and ecological layout of leading domestic a...

Global and China Passenger Car T-Box Market Report 2025

T-Box Research: T-Box will achieve functional upgrades given the demand from CVIS and end-to-end autonomous driving ResearchInChina released the "Global and China Passenger Car T-Box Market Report 20...

Automotive Microcontroller Unit (MCU) Industry Report, 2025

Research on automotive MCUs: the independent, controllable supply chain for automotive MCUs is rapidly maturing Mid-to-high-end MCUs for intelligent vehicle control are a key focus of domestic produc...

Automotive LiDAR Industry Report, 2024-2025

In early 2025, BYD's "Eye of God" Intelligent Driving and Changan Automobile's Tianshu Intelligent Driving sparked a wave of mass intelligent driving, making the democratization of intelligent driving...

Software-Defined Vehicles in 2025: SOA and Middleware Industry Research Report

Research on automotive SOA and middleware: Development towards global SOA, cross-domain communication middleware, AI middleware, etc. With the implementation of centrally integrated EEAs, OEM softwar...

Global and Chinese OEMs’ Modular and Common Technology Platform Research Report, 2025

Modular platforms and common technology platforms of OEMs are at the core of current technological innovation in automotive industry, aiming to enhance R&D efficiency, reduce costs, and accelerate...

Research Report on the Application of AI in Automotive Cockpits, 2025

Cockpit AI Application Research: From "Usable" to "User-Friendly," from "Deep Interaction" to "Self-Evolution" From the early 2000s, when voice recognition and facial monitoring functions were first ...

Analysis on Li Auto’s Layout in Electrification, Connectivity, Intelligence and Sharing, 2024-2025

Mind GPT: The "super brain" of automotive AI        Li Xiang regards Mind GPT as the core of Li Auto’s AI strategy. As of January 2025, Mind GPT had undergone multip...

Automotive High-precision Positioning Research Report, 2025

High-precision positioning research: IMU develops towards "domain controller integration" and "software/hardware integrated service integration" According to ResearchInChina, in 2024, the penetration...

China Passenger Car Digital Chassis Research Report, 2025

Digital chassis research: Local OEMs accelerate chassis digitization and AI   1. What is the “digital chassis”? Previously, we mostly talked about concepts such as traditional chassis, ch...

Automotive Micromotor and Motion Mechanism Industry Report, 2025

Automotive Micromotor and Motion Mechanism Research: More automotive micromotors and motion mechanisms are used in a single vehicle, especially in cockpits, autonomous driving and other scenarios. Au...

Research Report on AI Foundation Models and Their Applications in Automotive Field, 2024-2025

Research on AI foundation models and automotive applications: reasoning, cost reduction, and explainability Reasoning capabilities drive up the performance of foundation models. Since the second ha...

2005- www.researchinchina.com All Rights Reserved 京ICP备05069564号-1 京公网安备1101054484号