Editing
Tabular Deep Learning
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Understanding</span> == The "tabular gap": neural networks excel at images, text, and audio because these have spatial/sequential structure that convolutions and attention exploit efficiently. Tabular data lacks this structure β features are semantically heterogeneous, with no natural ordering. A column called "age" is fundamentally different from a column called "revenue" in ways that have no analog in pixels. **Why GBDTs win**: Gradient boosted trees handle heterogeneous features natively, discover complex feature interactions via splits, are robust to irrelevant features (automatic feature selection), require minimal preprocessing, and train quickly. They're hard to beat on tabular benchmarks because they solve exactly the problems posed by tabular data without the overhead of deep learning. **Where DL can win on tabular data**: (1) **Large datasets** (>100K samples): neural networks improve with scale where GBDTs plateau. (2) **High-cardinality categoricals**: entity embeddings for user IDs, product IDs with millions of values. (3) **Multi-modal inputs**: when tabular data is combined with text, images, or other modalities. (4) **End-to-end learning**: when the tabular model is part of a larger differentiable system. (5) **Online learning**: neural networks update incrementally more easily than tree ensembles. **FT-Transformer β current SOTA**: The Feature Tokenizer + Transformer embeds each feature as a token (using a linear layer for numerical, embedding table for categorical), prepends a CLS token, and applies standard transformer layers. It consistently outperforms TabNet and approaches GBDT performance on many benchmarks β while being a clean, generalizable architecture. **TabPFN β few-shot tabular ML**: Pre-trained on millions of synthetic tabular datasets, TabPFN uses in-context learning to make predictions on new small datasets (up to ~1000 samples) with a single forward pass β no training required. On small datasets it frequently matches or beats XGBoost with minutes of tuning. This is a fundamentally different paradigm from standard ML. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information