Autonomous Vehicles and Self-Driving AI: Difference between revisions
New article: Autonomous Vehicles and Self-Driving AI structured through Bloom's Taxonomy |
BloomWiki: Autonomous Vehicles and Self-Driving AI |
||
| Line 1: | Line 1: | ||
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | |||
{{BloomIntro}} | {{BloomIntro}} | ||
Autonomous vehicles (AVs) and self-driving AI represent one of the most ambitious applications of artificial intelligence — teaching machines to navigate the complex, dynamic, and often unpredictable physical world. A self-driving system must simultaneously perceive its surroundings using sensors, understand the scene, predict the behavior of other road users, plan a safe path, and execute that plan through precise vehicle control — all in real time, in all weather conditions, across all edge cases, with zero tolerance for critical failure. AVs are a system-level challenge that combines computer vision, robotics, planning, and AI safety. | Autonomous vehicles (AVs) and self-driving AI represent one of the most ambitious applications of artificial intelligence — teaching machines to navigate the complex, dynamic, and often unpredictable physical world. A self-driving system must simultaneously perceive its surroundings using sensors, understand the scene, predict the behavior of other road users, plan a safe path, and execute that plan through precise vehicle control — all in real time, in all weather conditions, across all edge cases, with zero tolerance for critical failure. AVs are a system-level challenge that combines computer vision, robotics, planning, and AI safety. | ||
</div> | |||
== Remembering == | __TOC__ | ||
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | |||
== <span style="color: #FFFFFF;">Remembering</span> == | |||
* '''Autonomy levels (SAE J3016)''' — A standardization of self-driving capability from Level 0 (no automation) to Level 5 (fully autonomous in all conditions). | * '''Autonomy levels (SAE J3016)''' — A standardization of self-driving capability from Level 0 (no automation) to Level 5 (fully autonomous in all conditions). | ||
* '''Level 2''' — Partial automation: the system controls steering and acceleration/braking but the human must monitor and be ready to take over. Examples: Tesla Autopilot, GM Super Cruise. | * '''Level 2''' — Partial automation: the system controls steering and acceleration/braking but the human must monitor and be ready to take over. Examples: Tesla Autopilot, GM Super Cruise. | ||
| Line 19: | Line 24: | ||
* '''Occupancy grid''' — A 2D or 3D grid representing which cells in space are occupied by obstacles. | * '''Occupancy grid''' — A 2D or 3D grid representing which cells in space are occupied by obstacles. | ||
* '''BEV (Bird's Eye View)''' — A top-down representation of the driving scene; commonly used for perception and planning in modern AVs. | * '''BEV (Bird's Eye View)''' — A top-down representation of the driving scene; commonly used for perception and planning in modern AVs. | ||
</div> | |||
== Understanding == | <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Understanding</span> == | |||
An autonomous vehicle system is a tightly coupled pipeline with four major subsystems: | An autonomous vehicle system is a tightly coupled pipeline with four major subsystems: | ||
| Line 36: | Line 43: | ||
The challenge: '''the long tail of edge cases'''. A system can handle 99.9% of driving situations correctly, but the remaining 0.1% is what kills people. AVs must handle construction zones, emergency vehicles, unusual signage, debris, and countless other rare events — while maintaining exact system safety. | The challenge: '''the long tail of edge cases'''. A system can handle 99.9% of driving situations correctly, but the remaining 0.1% is what kills people. AVs must handle construction zones, emergency vehicles, unusual signage, debris, and countless other rare events — while maintaining exact system safety. | ||
</div> | |||
== Applying == | <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Applying</span> == | |||
'''Object detection from LIDAR point clouds with PointPillars:''' | '''Object detection from LIDAR point clouds with PointPillars:''' | ||
| Line 87: | Line 96: | ||
: '''Imitation learning''' → Train on human driving demonstrations (behavioral cloning). Simple but doesn't generalize to novel situations. | : '''Imitation learning''' → Train on human driving demonstrations (behavioral cloning). Simple but doesn't generalize to novel situations. | ||
: '''RL + simulation''' → Train planning policy in simulation via RL. Hard to transfer to real world (sim-to-real gap). | : '''RL + simulation''' → Train planning policy in simulation via RL. Hard to transfer to real world (sim-to-real gap). | ||
</div> | |||
== Analyzing == | <div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Analyzing</span> == | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ AV Sensor Comparison | |+ AV Sensor Comparison | ||
| Line 108: | Line 119: | ||
* '''Map dependency''' — HD map-dependent systems fail when roads have changed (new construction, temporary detours) since the map was created. | * '''Map dependency''' — HD map-dependent systems fail when roads have changed (new construction, temporary detours) since the map was created. | ||
* '''Adversarial attacks''' — Stop signs with adversarial stickers can cause misclassification. Lane markings can be fooled with tape. | * '''Adversarial attacks''' — Stop signs with adversarial stickers can cause misclassification. Lane markings can be fooled with tape. | ||
</div> | |||
== Evaluating == | <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Evaluating</span> == | |||
AV evaluation requires both in-simulation and on-road measurement: | AV evaluation requires both in-simulation and on-road measurement: | ||
| Line 121: | Line 134: | ||
Expert practitioners distinguish between '''validation''' (does the system perform as designed?) and '''verification''' (is the design sufficient for the deployment domain?). Both are required for safety-critical systems. | Expert practitioners distinguish between '''validation''' (does the system perform as designed?) and '''verification''' (is the design sufficient for the deployment domain?). Both are required for safety-critical systems. | ||
</div> | |||
== Creating == | <div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Creating</span> == | |||
Designing an autonomous vehicle perception-planning architecture: | Designing an autonomous vehicle perception-planning architecture: | ||
| Line 163: | Line 178: | ||
[[Category:Computer Vision]] | [[Category:Computer Vision]] | ||
[[Category:Robotics]] | [[Category:Robotics]] | ||
</div> | |||
Latest revision as of 01:47, 25 April 2026
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Autonomous vehicles (AVs) and self-driving AI represent one of the most ambitious applications of artificial intelligence — teaching machines to navigate the complex, dynamic, and often unpredictable physical world. A self-driving system must simultaneously perceive its surroundings using sensors, understand the scene, predict the behavior of other road users, plan a safe path, and execute that plan through precise vehicle control — all in real time, in all weather conditions, across all edge cases, with zero tolerance for critical failure. AVs are a system-level challenge that combines computer vision, robotics, planning, and AI safety.
Remembering[edit]
- Autonomy levels (SAE J3016) — A standardization of self-driving capability from Level 0 (no automation) to Level 5 (fully autonomous in all conditions).
- Level 2 — Partial automation: the system controls steering and acceleration/braking but the human must monitor and be ready to take over. Examples: Tesla Autopilot, GM Super Cruise.
- Level 4 — High automation: the system handles all driving in specific conditions (geofenced area, certain weather) without human intervention; no steering wheel may be required.
- Level 5 — Full automation: the system can handle all driving in all conditions; no human required.
- Perception — The AV's ability to detect and classify objects in its environment using sensors.
- LIDAR (Light Detection and Ranging) — A sensor that fires laser pulses and measures return time to build a 3D point cloud of the environment.
- RADAR — Radio wave-based sensor; accurate at measuring velocity and works in all weather conditions; lower resolution than LIDAR.
- Point cloud — A set of 3D points returned by LIDAR, representing the 3D structure of the environment.
- HD Map (High Definition Map) — A centimeter-accurate 3D map of roads including lane markings, traffic signs, and geometry; used by many AV systems for localization.
- Localization — Precisely determining the vehicle's position and orientation within the HD map.
- SLAM (Simultaneous Localization and Mapping) — Building a map of the environment while simultaneously tracking the vehicle's position within it.
- Prediction — Forecasting the future positions and behaviors of other road users (pedestrians, cyclists, vehicles).
- Planning — Computing a safe, comfortable trajectory for the vehicle to follow.
- Control — Converting planned trajectories into actuator commands (steering angle, throttle, brake).
- Occupancy grid — A 2D or 3D grid representing which cells in space are occupied by obstacles.
- BEV (Bird's Eye View) — A top-down representation of the driving scene; commonly used for perception and planning in modern AVs.
Understanding[edit]
An autonomous vehicle system is a tightly coupled pipeline with four major subsystems:
1. Perception: Raw sensor data → structured understanding of the scene. Multiple sensors fuse data: cameras provide color and texture; LIDAR provides precise 3D geometry; RADAR provides velocity and works in fog/rain. Sensor fusion combines these into a unified world model. Deep learning (CNNs, ViTs, PointNets) detects vehicles, pedestrians, cyclists, traffic lights, and lane markings.
2. Prediction: Other agents will move — where? Neural networks predict the probability distribution over future trajectories of all detected agents. This is hard because behavior is multi-modal (a pedestrian might step left or right), socially interactive (they react to each other), and intent-dependent (are they going to cross or waiting?).
3. Planning: Given the current world state and predicted agent trajectories, find a safe, legal, and comfortable path for the ego vehicle. This involves:
- Route planning (global): which roads to take from A to B
- Behavior planning: when to merge, yield, proceed at an intersection
- Motion planning (local): the specific trajectory (position, velocity, acceleration over time)
4. Control: Convert the planned trajectory into actuator commands. Model Predictive Control (MPC) is common — it continuously solves a short-horizon optimization that tracks the desired trajectory while respecting vehicle dynamics constraints.
The challenge: the long tail of edge cases. A system can handle 99.9% of driving situations correctly, but the remaining 0.1% is what kills people. AVs must handle construction zones, emergency vehicles, unusual signage, debris, and countless other rare events — while maintaining exact system safety.
Applying[edit]
Object detection from LIDAR point clouds with PointPillars:
<syntaxhighlight lang="python">
- Conceptual pipeline — real implementations use mmdetection3d or OpenPCDet
import numpy as np import torch
def preprocess_lidar(points, voxel_size=(0.16, 0.16, 4.0),
point_cloud_range=[-51.2, -51.2, -5, 51.2, 51.2, 3]): """ Convert raw LIDAR point cloud to voxelized representation. points: (N, 4) array of [x, y, z, intensity] Returns: pillars tensor for PointPillars input """ x_min, y_min, z_min, x_max, y_max, z_max = point_cloud_range vx, vy, vz = voxel_size
# Filter points to range
mask = ((points[:, 0] >= x_min) & (points[:, 0] < x_max) &
(points[:, 1] >= y_min) & (points[:, 1] < y_max) &
(points[:, 2] >= z_min) & (points[:, 2] < z_max))
points = points[mask]
# Assign points to pillars (vertical columns in XY grid) pillar_x = np.floor((points[:, 0] - x_min) / vx).astype(int) pillar_y = np.floor((points[:, 1] - y_min) / vy).astype(int) pillar_idx = pillar_x * int((y_max - y_min) / vy) + pillar_y
return points, pillar_idx # Feed to PointPillar encoder → BEV feature map
- Planning: Model Predictive Control (simplified)
def mpc_step(current_state, reference_trajectory, horizon=20, dt=0.1):
""" current_state: [x, y, heading, velocity] reference_trajectory: array of (x, y, heading, velocity) waypoints Returns optimal steering and acceleration commands """ # In practice: scipy.optimize.minimize or CasADi for the optimization # Objective: minimize deviation from reference + control effort # Constraints: vehicle dynamics, comfort (max jerk), safety (no collision) pass # Placeholder for the optimization
</syntaxhighlight>
- AV system architecture approaches
- Modular pipeline → Separate perception, prediction, planning, control modules. Interpretable, easier to debug. Waymo, Cruise.
- End-to-end learning → Single neural network maps sensors → control commands. Less interpretable but captures cross-module synergies. Tesla FSD, NVIDIA DRIVE.
- Imitation learning → Train on human driving demonstrations (behavioral cloning). Simple but doesn't generalize to novel situations.
- RL + simulation → Train planning policy in simulation via RL. Hard to transfer to real world (sim-to-real gap).
Analyzing[edit]
| Sensor | 3D Geometry | Velocity | Color/Texture | Weather Robustness | Range | Cost |
|---|---|---|---|---|---|---|
| Camera | No (stereo: approximate) | No | Yes | Poor (fog, night) | Variable | Low |
| LIDAR | Yes (precise) | No | No | Moderate | 100–200m | High |
| RADAR | Approximate | Yes | No | Excellent | 200m+ | Medium |
| Ultrasonic | Limited | No | No | Good | <5m | Very low |
Critical failure modes:
- Long tail edge cases — The AV encounters a situation not represented in training data: a child running into the road from behind a parked car, a mattress falling off a truck. These "unknown unknowns" are the central safety challenge.
- Sensor degradation — Heavy rain, snow, or direct sunlight degrades camera and LIDAR inputs. RADAR is more robust but provides less detail. Sensor fusion must handle individual sensor failures gracefully.
- Prediction failure — The system correctly perceives an agent but incorrectly predicts its behavior. A jaywalker predicted to stay on the curb who steps into the road.
- Map dependency — HD map-dependent systems fail when roads have changed (new construction, temporary detours) since the map was created.
- Adversarial attacks — Stop signs with adversarial stickers can cause misclassification. Lane markings can be fooled with tape.
Evaluating[edit]
AV evaluation requires both in-simulation and on-road measurement:
Disengagement rate: The number of times a human safety driver had to take over per 1,000 miles. Regulated and reported publicly in California. Lower is better, but context matters (urban driving is harder than highway).
Miles between critical events: The rate of safety-critical events (emergency braking, near-collisions) per million miles. Waymo reports this as a key safety metric.
Simulation evaluation: Replaying real incidents with the system under evaluation to test counterfactual outcomes. Inject synthetic edge cases (cut-offs, jaywalkers) to test response without real-world risk.
Behavioral metrics: Comfort (jerk, acceleration profiles), efficiency (travel time vs. human), and correctness (legal compliance, right-of-way). A safe system that is also uncomfortable or excessively slow will not be accepted.
Expert practitioners distinguish between validation (does the system perform as designed?) and verification (is the design sufficient for the deployment domain?). Both are required for safety-critical systems.
Creating[edit]
Designing an autonomous vehicle perception-planning architecture:
1. Sensor suite design <syntaxhighlight lang="text"> 360° camera ring (6–8 cameras): full surround vision at different focal lengths
+ 3× LIDAR: long-range front (200m), two shorter-range side units + 5× RADAR: front long-range, four corners for blind-spot/cut-in detection + GPS/IMU: high-accuracy positioning fused with HD map + Redundant compute: two independent compute stacks (fail-safe)
</syntaxhighlight>
2. Perception architecture (modern BEV approach) <syntaxhighlight lang="text"> Multi-camera images + LIDAR points
↓
[Camera backbone: ViT or ResNet → feature maps per camera]
↓
[BEV lifting: cross-attention maps image features → BEV grid (BEVFormer)]
↓
[LIDAR backbone: PointPillars or VoxelNet → BEV feature map]
↓
[Sensor fusion: concatenate camera-BEV + LIDAR-BEV feature maps]
↓
[Detection head: 3D bounding boxes + class + velocity]
↓
[Segmentation head: driveable area, lane markings]
↓
[Occupancy prediction: future occupancy grid] </syntaxhighlight>
3. Safety architecture (redundancy and fallback)
- Functional safety per ISO 26262 (Automotive Safety Integrity Level D for steering/braking)
- Minimal Risk Condition (MRC): ability to safely pull over and stop if system failure detected
- Independent safety monitor: separate system that can override if primary system is compromised
- Watchdog timers: if any module stops responding within deadline, initiate MRC
- Formal verification of critical planning constraints (speed limits, collision avoidance)