Editing Tabular Deep Learning (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
The "tabular gap": neural networks excel at images, text, and audio because these have spatial/sequential structure that convolutions and attention exploit efficiently. Tabular data lacks this structure — features are semantically heterogeneous, with no natural ordering. A column called "age" is fundamentally different from a column called "revenue" in ways that have no analog in pixels.

**Why GBDTs win**: Gradient boosted trees handle heterogeneous features natively, discover complex feature interactions via splits, are robust to irrelevant features (automatic feature selection), require minimal preprocessing, and train quickly. They're hard to beat on tabular benchmarks because they solve exactly the problems posed by tabular data without the overhead of deep learning.

**Where DL can win on tabular data**: (1) **Large datasets** (>100K samples): neural networks improve with scale where GBDTs plateau. (2) **High-cardinality categoricals**: entity embeddings for user IDs, product IDs with millions of values. (3) **Multi-modal inputs**: when tabular data is combined with text, images, or other modalities. (4) **End-to-end learning**: when the tabular model is part of a larger differentiable system. (5) **Online learning**: neural networks update incrementally more easily than tree ensembles.

**FT-Transformer — current SOTA**: The Feature Tokenizer + Transformer embeds each feature as a token (using a linear layer for numerical, embedding table for categorical), prepends a CLS token, and applies standard transformer layers. It consistently outperforms TabNet and approaches GBDT performance on many benchmarks — while being a clean, generalizable architecture.

**TabPFN — few-shot tabular ML**: Pre-trained on millions of synthetic tabular datasets, TabPFN uses in-context learning to make predictions on new small datasets (up to ~1000 samples) with a single forward pass — no training required. On small datasets it frequently matches or beats XGBoost with minutes of tuning. This is a fundamentally different paradigm from standard ML.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">