Autonomous Vehicles and Self-Driving AI

From BloomWiki
Revision as of 01:47, 25 April 2026 by Wordpad (talk | contribs) (BloomWiki: Autonomous Vehicles and Self-Driving AI)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Autonomous vehicles (AVs) and self-driving AI represent one of the most ambitious applications of artificial intelligence — teaching machines to navigate the complex, dynamic, and often unpredictable physical world. A self-driving system must simultaneously perceive its surroundings using sensors, understand the scene, predict the behavior of other road users, plan a safe path, and execute that plan through precise vehicle control — all in real time, in all weather conditions, across all edge cases, with zero tolerance for critical failure. AVs are a system-level challenge that combines computer vision, robotics, planning, and AI safety.

Remembering[edit]

  • Autonomy levels (SAE J3016) — A standardization of self-driving capability from Level 0 (no automation) to Level 5 (fully autonomous in all conditions).
  • Level 2 — Partial automation: the system controls steering and acceleration/braking but the human must monitor and be ready to take over. Examples: Tesla Autopilot, GM Super Cruise.
  • Level 4 — High automation: the system handles all driving in specific conditions (geofenced area, certain weather) without human intervention; no steering wheel may be required.
  • Level 5 — Full automation: the system can handle all driving in all conditions; no human required.
  • Perception — The AV's ability to detect and classify objects in its environment using sensors.
  • LIDAR (Light Detection and Ranging) — A sensor that fires laser pulses and measures return time to build a 3D point cloud of the environment.
  • RADAR — Radio wave-based sensor; accurate at measuring velocity and works in all weather conditions; lower resolution than LIDAR.
  • Point cloud — A set of 3D points returned by LIDAR, representing the 3D structure of the environment.
  • HD Map (High Definition Map) — A centimeter-accurate 3D map of roads including lane markings, traffic signs, and geometry; used by many AV systems for localization.
  • Localization — Precisely determining the vehicle's position and orientation within the HD map.
  • SLAM (Simultaneous Localization and Mapping) — Building a map of the environment while simultaneously tracking the vehicle's position within it.
  • Prediction — Forecasting the future positions and behaviors of other road users (pedestrians, cyclists, vehicles).
  • Planning — Computing a safe, comfortable trajectory for the vehicle to follow.
  • Control — Converting planned trajectories into actuator commands (steering angle, throttle, brake).
  • Occupancy grid — A 2D or 3D grid representing which cells in space are occupied by obstacles.
  • BEV (Bird's Eye View) — A top-down representation of the driving scene; commonly used for perception and planning in modern AVs.

Understanding[edit]

An autonomous vehicle system is a tightly coupled pipeline with four major subsystems:

1. Perception: Raw sensor data → structured understanding of the scene. Multiple sensors fuse data: cameras provide color and texture; LIDAR provides precise 3D geometry; RADAR provides velocity and works in fog/rain. Sensor fusion combines these into a unified world model. Deep learning (CNNs, ViTs, PointNets) detects vehicles, pedestrians, cyclists, traffic lights, and lane markings.

2. Prediction: Other agents will move — where? Neural networks predict the probability distribution over future trajectories of all detected agents. This is hard because behavior is multi-modal (a pedestrian might step left or right), socially interactive (they react to each other), and intent-dependent (are they going to cross or waiting?).

3. Planning: Given the current world state and predicted agent trajectories, find a safe, legal, and comfortable path for the ego vehicle. This involves:

  • Route planning (global): which roads to take from A to B
  • Behavior planning: when to merge, yield, proceed at an intersection
  • Motion planning (local): the specific trajectory (position, velocity, acceleration over time)

4. Control: Convert the planned trajectory into actuator commands. Model Predictive Control (MPC) is common — it continuously solves a short-horizon optimization that tracks the desired trajectory while respecting vehicle dynamics constraints.

The challenge: the long tail of edge cases. A system can handle 99.9% of driving situations correctly, but the remaining 0.1% is what kills people. AVs must handle construction zones, emergency vehicles, unusual signage, debris, and countless other rare events — while maintaining exact system safety.

Applying[edit]

Object detection from LIDAR point clouds with PointPillars:

<syntaxhighlight lang="python">

  1. Conceptual pipeline — real implementations use mmdetection3d or OpenPCDet

import numpy as np import torch

def preprocess_lidar(points, voxel_size=(0.16, 0.16, 4.0),

                     point_cloud_range=[-51.2, -51.2, -5, 51.2, 51.2, 3]):
   """
   Convert raw LIDAR point cloud to voxelized representation.
   points: (N, 4) array of [x, y, z, intensity]
   Returns: pillars tensor for PointPillars input
   """
   x_min, y_min, z_min, x_max, y_max, z_max = point_cloud_range
   vx, vy, vz = voxel_size
   # Filter points to range
   mask = ((points[:, 0] >= x_min) & (points[:, 0] < x_max) &
           (points[:, 1] >= y_min) & (points[:, 1] < y_max) &
           (points[:, 2] >= z_min) & (points[:, 2] < z_max))
   points = points[mask]
   # Assign points to pillars (vertical columns in XY grid)
   pillar_x = np.floor((points[:, 0] - x_min) / vx).astype(int)
   pillar_y = np.floor((points[:, 1] - y_min) / vy).astype(int)
   pillar_idx = pillar_x * int((y_max - y_min) / vy) + pillar_y
   return points, pillar_idx  # Feed to PointPillar encoder → BEV feature map


  1. Planning: Model Predictive Control (simplified)

def mpc_step(current_state, reference_trajectory, horizon=20, dt=0.1):

   """
   current_state: [x, y, heading, velocity]
   reference_trajectory: array of (x, y, heading, velocity) waypoints
   Returns optimal steering and acceleration commands
   """
   # In practice: scipy.optimize.minimize or CasADi for the optimization
   # Objective: minimize deviation from reference + control effort
   # Constraints: vehicle dynamics, comfort (max jerk), safety (no collision)
   pass  # Placeholder for the optimization

</syntaxhighlight>

AV system architecture approaches
Modular pipeline → Separate perception, prediction, planning, control modules. Interpretable, easier to debug. Waymo, Cruise.
End-to-end learning → Single neural network maps sensors → control commands. Less interpretable but captures cross-module synergies. Tesla FSD, NVIDIA DRIVE.
Imitation learning → Train on human driving demonstrations (behavioral cloning). Simple but doesn't generalize to novel situations.
RL + simulation → Train planning policy in simulation via RL. Hard to transfer to real world (sim-to-real gap).

Analyzing[edit]

AV Sensor Comparison
Sensor 3D Geometry Velocity Color/Texture Weather Robustness Range Cost
Camera No (stereo: approximate) No Yes Poor (fog, night) Variable Low
LIDAR Yes (precise) No No Moderate 100–200m High
RADAR Approximate Yes No Excellent 200m+ Medium
Ultrasonic Limited No No Good <5m Very low

Critical failure modes:

  • Long tail edge cases — The AV encounters a situation not represented in training data: a child running into the road from behind a parked car, a mattress falling off a truck. These "unknown unknowns" are the central safety challenge.
  • Sensor degradation — Heavy rain, snow, or direct sunlight degrades camera and LIDAR inputs. RADAR is more robust but provides less detail. Sensor fusion must handle individual sensor failures gracefully.
  • Prediction failure — The system correctly perceives an agent but incorrectly predicts its behavior. A jaywalker predicted to stay on the curb who steps into the road.
  • Map dependency — HD map-dependent systems fail when roads have changed (new construction, temporary detours) since the map was created.
  • Adversarial attacks — Stop signs with adversarial stickers can cause misclassification. Lane markings can be fooled with tape.

Evaluating[edit]

AV evaluation requires both in-simulation and on-road measurement:

Disengagement rate: The number of times a human safety driver had to take over per 1,000 miles. Regulated and reported publicly in California. Lower is better, but context matters (urban driving is harder than highway).

Miles between critical events: The rate of safety-critical events (emergency braking, near-collisions) per million miles. Waymo reports this as a key safety metric.

Simulation evaluation: Replaying real incidents with the system under evaluation to test counterfactual outcomes. Inject synthetic edge cases (cut-offs, jaywalkers) to test response without real-world risk.

Behavioral metrics: Comfort (jerk, acceleration profiles), efficiency (travel time vs. human), and correctness (legal compliance, right-of-way). A safe system that is also uncomfortable or excessively slow will not be accepted.

Expert practitioners distinguish between validation (does the system perform as designed?) and verification (is the design sufficient for the deployment domain?). Both are required for safety-critical systems.

Creating[edit]

Designing an autonomous vehicle perception-planning architecture:

1. Sensor suite design <syntaxhighlight lang="text"> 360° camera ring (6–8 cameras): full surround vision at different focal lengths

   + 3× LIDAR: long-range front (200m), two shorter-range side units
   + 5× RADAR: front long-range, four corners for blind-spot/cut-in detection
   + GPS/IMU: high-accuracy positioning fused with HD map
   + Redundant compute: two independent compute stacks (fail-safe)

</syntaxhighlight>

2. Perception architecture (modern BEV approach) <syntaxhighlight lang="text"> Multi-camera images + LIDAR points

[Camera backbone: ViT or ResNet → feature maps per camera]

[BEV lifting: cross-attention maps image features → BEV grid (BEVFormer)]

[LIDAR backbone: PointPillars or VoxelNet → BEV feature map]

[Sensor fusion: concatenate camera-BEV + LIDAR-BEV feature maps]

[Detection head: 3D bounding boxes + class + velocity]

[Segmentation head: driveable area, lane markings]

[Occupancy prediction: future occupancy grid] </syntaxhighlight>

3. Safety architecture (redundancy and fallback)

  • Functional safety per ISO 26262 (Automotive Safety Integrity Level D for steering/braking)
  • Minimal Risk Condition (MRC): ability to safely pull over and stop if system failure detected
  • Independent safety monitor: separate system that can override if primary system is compromised
  • Watchdog timers: if any module stops responding within deadline, initiate MRC
  • Formal verification of critical planning constraints (speed limits, collision avoidance)