Neural Architecture Search: Difference between revisions

Latest revision as of 01:54, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Neural Architecture Search (NAS) is the process of automating the design of neural network architectures using machine learning. Instead of relying on human intuition and trial-and-error to design CNN, transformer, or MLP architectures, NAS algorithms systematically search through vast architecture spaces to discover high-performing designs for specific tasks and hardware constraints. NAS produced architectures like EfficientNet and NASNet that outperformed hand-designed networks, and principles from NAS are now embedded in tools like AutoML that democratize deep learning for non-experts.

Remembering[edit]

Neural Architecture Search (NAS) — Automated discovery of neural network architectures using optimization algorithms.
Search space — The set of all possible architectures considered; defined by a human as cells, operations, connections, and their ranges.
Search strategy — The algorithm used to explore the search space: random, evolutionary, reinforcement learning, gradient-based.
Performance estimation — How a candidate architecture's performance is estimated without full training; enables faster search.
Cell — A basic building block repeated in the final architecture; NAS often searches for optimal cells rather than full networks.
DARTS (Differentiable Architecture Search) — A gradient-based NAS that makes the architecture selection differentiable using a continuous relaxation.
Evolutionary NAS — Uses evolutionary algorithms (mutation, crossover, selection) to search architecture space.
RL-based NAS — Uses a controller trained with reinforcement learning to sample and evaluate architectures (original NAS paper, Zoph & Le 2017).
One-shot NAS — Trains a single large "supernet" containing all candidate architectures as sub-networks; evaluates sub-architectures by inheriting weights from the supernet.
Supernet — A network containing all candidate architectures as subgraphs; weights are shared across sub-architectures.
Hardware-aware NAS — Optimizes for both accuracy and hardware efficiency (FLOPs, latency, memory) simultaneously.
EfficientNet — A highly efficient CNN family discovered by NAS (compound scaling) that achieved state-of-the-art accuracy/efficiency tradeoffs.
NASNet — One of the first high-profile NAS results; discovered by RL-based search on CIFAR-10 and transferred to ImageNet.
ProxylessNAS — A memory-efficient NAS method that directly searches on the target task and hardware.
AutoML — Broader automation of the ML pipeline including NAS, HPO, and feature engineering.

Understanding[edit]

NAS frames architecture design as an optimization problem: find the architecture A* = argmax{A ∈ searchspace} Performance(A, task). The three key components are what to search (search space), how to search (search strategy), and how to evaluate candidates (performance estimation).

Early NAS (Zoph & Le, 2017): Used an RNN controller (meta-learner) to generate architecture descriptions. Each generated architecture was trained to convergence on CIFAR-10, and its accuracy was used as a reward signal to update the controller via REINFORCE. This worked but required 800 GPUs for 28 days — prohibitively expensive.

DARTS (Liu et al., 2019): Made NAS practical by making architecture selection differentiable. Instead of discrete choices, DARTS maintains a continuous mixture: output = Σo αo · opo(x), where αo are learnable architecture parameters. Gradient descent simultaneously optimizes network weights (on training data) and architecture parameters (on validation data). At the end, the highest-weight operation at each edge is selected. This reduced search to a few GPU-days.

One-shot NAS / Weight sharing: Train a single supernet where all operations share weights. Individual sub-architectures inherit weights from the supernet for fast evaluation. Hardware-aware NAS adds latency or FLOPs as a penalty, enabling automatic design for mobile, edge, or server targets.

The proxy problem: Architectures found by searching on CIFAR-10 may not transfer to ImageNet or other tasks. Hardware benchmarks in simulation may differ from actual device latency. Expert NAS practitioners search directly on target tasks and hardware when feasible.

Applying[edit]

Architecture search with DARTS (simplified): <syntaxhighlight lang="python"> import torch import torch.nn as nn import torch.nn.functional as F

OPERATIONS = {

   'skip_connect': lambda C: nn.Identity(),
   'sep_conv_3x3': lambda C: SepConv(C, C, 3, 1),
   'dil_conv_5x5': lambda C: DilConv(C, C, 5, 2, 2),
   'max_pool_3x3': lambda C: nn.MaxPool2d(3, 1, 1),
   'avg_pool_3x3': lambda C: nn.AvgPool2d(3, 1, 1, count_include_pad=False),

}

class MixedOp(nn.Module):

   """A mixed operation where each candidate is weighted by architecture parameter."""
   def __init__(self, C):
       super().__init__()
       self.ops = nn.ModuleList([op(C) for op in OPERATIONS.values()])

   def forward(self, x, weights):
       # weights: softmax of architecture parameters α
       return sum(w * op(x) for w, op in zip(weights, self.ops))

Architecture parameters (separate from network weights)

arch_params = nn.Parameter(torch.randn(n_edges, len(OPERATIONS)))

Bi-level optimization:
1. Update network weights on train set (standard gradient)
2. Update arch_params on val set (gradient through mixed ops)
Final: discretize by argmax(softmax(arch_params)) per edge

</syntaxhighlight>

NAS approach selection: Compute budget <1 GPU-day → DARTS, GDAS, or random search (surprisingly competitive); Hardware-specific target → ProxylessNAS, SNAS, Once-for-All; Zero search cost → Zero-cost proxies (NASWOT, synflow) — correlate with performance without training; Production ML teams → AutoML platforms: Google AutoML, Azure AutoML, Auto-Sklearn

Analyzing[edit]

NAS Strategy Comparison
Strategy	GPU-Days	Transfer Quality	Implementation Complexity
RL-based (original NAS)	800 × 28 days	High	Very high
Evolutionary	150-3000 GPU-days	High	Medium
DARTS	4 GPU-days	Good	Medium
One-shot (SPOS)	12 GPU-days	Good	Medium
Random search	1-10 GPU-days	Surprisingly good	Very low

Failure modes: Performance collapse — DARTS over-selects skip connections and parameter-free operations. Search space bias — the search space implicitly encodes human knowledge; poor search space design leads to poor results. Weight sharing collapse — shared weights in supernets lead to inaccurate architecture rankings. Hardware simulation error — latency on device differs from lookup table estimates.

Evaluating[edit]

Evaluate NAS results on:

Final accuracy on target task/dataset.
FLOPs, latency, and memory at target hardware — report all three, not just FLOPs.
Search cost (GPU-hours).
Robustness: does the discovered architecture maintain quality when trained from scratch (not using the shared supernet weights)? Expert practitioners always compare against strong hand-designed baselines at equal FLOPs, and run multiple NAS trials to quantify search variance.

Creating[edit]

Designing a hardware-aware NAS pipeline:

Profile all candidate operations on target device with lookup table (latency per op).
Define search space as cell structure with K=8–14 candidate operations per edge.
Run one-shot supernet training with uniform operation sampling.
Evaluate sub-architectures by sampling 1000 random architectures, estimating performance using inherited supernet weights.
Pareto-optimal selection: identify architectures on the accuracy-latency Pareto frontier.
Retrain top-3 Pareto-optimal architectures from scratch to verify supernet rankings transfer.

@@ Line 1: / Line 1: @@
+<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
 {{BloomIntro}}
 Neural Architecture Search (NAS) is the process of automating the design of neural network architectures using machine learning. Instead of relying on human intuition and trial-and-error to design CNN, transformer, or MLP architectures, NAS algorithms systematically search through vast architecture spaces to discover high-performing designs for specific tasks and hardware constraints. NAS produced architectures like EfficientNet and NASNet that outperformed hand-designed networks, and principles from NAS are now embedded in tools like AutoML that democratize deep learning for non-experts.
+</div>
-== Remembering ==
+__TOC__
+<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
+== <span style="color: #FFFFFF;">Remembering</span> ==
 * '''Neural Architecture Search (NAS)''' — Automated discovery of neural network architectures using optimization algorithms.
 * '''Search space''' — The set of all possible architectures considered; defined by a human as cells, operations, connections, and their ranges.
@@ Line 18: / Line 23: @@
 * '''ProxylessNAS''' — A memory-efficient NAS method that directly searches on the target task and hardware.
 * '''AutoML''' — Broader automation of the ML pipeline including NAS, HPO, and feature engineering.
+</div>
-== Understanding ==
+<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
+== <span style="color: #FFFFFF;">Understanding</span> ==
 NAS frames architecture design as an optimization problem: find the architecture A* = argmax''{A ∈ search''space} Performance(A, task). The three key components are what to search (search space), how to search (search strategy), and how to evaluate candidates (performance estimation).
@@ Line 29: / Line 36: @@
 '''The proxy problem''': Architectures found by searching on CIFAR-10 may not transfer to ImageNet or other tasks. Hardware benchmarks in simulation may differ from actual device latency. Expert NAS practitioners search directly on target tasks and hardware when feasible.
+</div>
-== Applying ==
+<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
+== <span style="color: #FFFFFF;">Applying</span> ==
 '''Architecture search with DARTS (simplified):'''
 <syntaxhighlight lang="python">
@@ Line 69: / Line 78: @@
 : '''Zero search cost''' → Zero-cost proxies (NASWOT, synflow) — correlate with performance without training
 : '''Production ML teams''' → AutoML platforms: Google AutoML, Azure AutoML, Auto-Sklearn
+</div>
-== Analyzing ==
+<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
+== <span style="color: #FFFFFF;">Analyzing</span> ==
 {| class="wikitable"
 |+ NAS Strategy Comparison
@@ Line 87: / Line 98: @@
 '''Failure modes''': Performance collapse — DARTS over-selects skip connections and parameter-free operations. Search space bias — the search space implicitly encodes human knowledge; poor search space design leads to poor results. Weight sharing collapse — shared weights in supernets lead to inaccurate architecture rankings. Hardware simulation error — latency on device differs from lookup table estimates.
+</div>
-== Evaluating ==
+<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
-Evaluate NAS results on: (1) Final accuracy on target task/dataset. (2) FLOPs, latency, and memory at target hardware — report all three, not just FLOPs. (3) Search cost (GPU-hours). (4) Robustness: does the discovered architecture maintain quality when trained from scratch (not using the shared supernet weights)? Expert practitioners always compare against strong hand-designed baselines at equal FLOPs, and run multiple NAS trials to quantify search variance.
+== <span style="color: #FFFFFF;">Evaluating</span> ==
+Evaluate NAS results on:
+# Final accuracy on target task/dataset.
+# FLOPs, latency, and memory at target hardware — report all three, not just FLOPs.
+# Search cost (GPU-hours).
+# Robustness: does the discovered architecture maintain quality when trained from scratch (not using the shared supernet weights)? Expert practitioners always compare against strong hand-designed baselines at equal FLOPs, and run multiple NAS trials to quantify search variance.
+</div>
-== Creating ==
+<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
-Designing a hardware-aware NAS pipeline: (1) Profile all candidate operations on target device with lookup table (latency per op). (2) Define search space as cell structure with K=8–14 candidate operations per edge. (3) Run one-shot supernet training with uniform operation sampling. (4) Evaluate sub-architectures by sampling 1000 random architectures, estimating performance using inherited supernet weights. (5) Pareto-optimal selection: identify architectures on the accuracy-latency Pareto frontier. (6) Retrain top-3 Pareto-optimal architectures from scratch to verify supernet rankings transfer.
+== <span style="color: #FFFFFF;">Creating</span> ==
+Designing a hardware-aware NAS pipeline:
+# Profile all candidate operations on target device with lookup table (latency per op).
+# Define search space as cell structure with K=8–14 candidate operations per edge.
+# Run one-shot supernet training with uniform operation sampling.
+# Evaluate sub-architectures by sampling 1000 random architectures, estimating performance using inherited supernet weights.
+# Pareto-optimal selection: identify architectures on the accuracy-latency Pareto frontier.
+# Retrain top-3 Pareto-optimal architectures from scratch to verify supernet rankings transfer.
 [[Category:Artificial Intelligence]]
 [[Category:Deep Learning]]
 [[Category:AutoML]]
+</div>