Data Structures
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Data Structures are specialized formats for organizing, processing, retrieving, and storing data in a computer. While algorithms are the "logic," data structures are the "vessels" that hold the information. Choosing the right data structure is one of the most important decisions a programmer makes; it can be the difference between an application that is lightning-fast and one that crashes. From simple lists and arrays to complex graphs and hash tables, data structures provide the structural integrity for all software systems.
Remembering
- Data Structure — A way of organizing data so that it can be used efficiently.
- Array — A collection of items stored at contiguous memory locations; indexed by number.
- Linked List — A linear collection of data elements where each element points to the next.
- Stack — A "Last-In, First-Out" (LIFO) structure (like a stack of plates).
- Queue — A "First-In, First-Out" (FIFO) structure (like a line at a store).
- Hash Table — A structure that maps "keys" to "values" using a hash function (O(1) lookup).
- Tree — A hierarchical structure with a "root" and "child" nodes (e.g., Binary Search Tree).
- Graph — A set of nodes (vertices) connected by lines (edges); used to model networks.
- Heap — A specialized tree-based structure used for priority queues.
- Set — A collection of distinct elements with no specific order.
- Pointer — A variable that stores the memory address of another variable.
- Abstract Data Type (ADT) — A mathematical model for data types (defines what it does, not how it's implemented).
- Dynamic Array — An array that can resize itself during execution (e.g., Python list).
- Trie — A specialized tree used for searching strings (e.g., for autocomplete).
Understanding
Data structures represent tradeoffs between Time (access speed) and Space (memory usage).
Linear vs. Non-Linear Structures:
- Arrays: Best for random access (I want the 500th item). Hard to resize or insert in the middle.
- Linked Lists: Best for frequent insertions/deletions. Slow to find a specific item because you have to "walk" from the start.
- Hash Tables: The "Magic" structure. By turning a name (key) into a number (hash), you can jump directly to the data. This is how high-performance databases work.
Hierarchical and Network Structures:
- Trees: Used whenever data is nested (folders on your computer, HTML on a webpage).
- Graphs: The most flexible structure. Used to model the Internet, social networks (Facebook "friends"), or road maps.
The Power of Abstraction: We often talk about "Stacks" or "Queues" as Abstract Data Types. You can build a Stack using an Array *or* a Linked List; the "user" doesn't care about the implementation as long as it follows the "Last-In, First-Out" rule.
Applying
Implementing a Stack vs. Queue: <syntaxhighlight lang="python"> class Stack:
"""LIFO: Last-In, First-Out""" def __init__(self): self.items = [] def push(self, x): self.items.append(x) def pop(self): return self.items.pop()
class Queue:
"""FIFO: First-In, First-Out""" def __init__(self): self.items = [] def enqueue(self, x): self.items.append(x) def dequeue(self): return self.items.pop(0)
- Use Cases
s = Stack(); q = Queue() for x in ["First", "Second", "Third"]:
s.push(x); q.enqueue(x)
print(f"Stack Pop (LIFO): {s.pop()}") # Third print(f"Queue Dequeue (FIFO): {q.dequeue()}") # First
- Stacks are used for 'Undo' buttons; Queues for printer tasks.
</syntaxhighlight>
- Data Structures in Action
- Social Networks → Graphs (users as nodes, friendships as edges).
- Excel / Spreadsheets → 2D Arrays.
- Web Browsers → Stacks (the 'Back' button uses a history stack).
- Dictionary / Database → Hash Tables (look up word/ID instantly).
- AI / Pathfinding → Trees and Heaps.
Analyzing
| Structure | Access | Search | Insertion | Deletion |
|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) |
| Stack / Queue | O(n) | O(n) | O(1) | O(1) |
| Hash Table | N/A | O(1) | O(1) | O(1) |
| Binary Search Tree | O(log n) | O(log n) | O(log n) | O(log n) |
The Importance of Memory Layout: In modern hardware, the physical arrangement of data matters. Arrays are fast not just because of their math, but because they sit together in memory, making them "Cache Friendly." Linked lists, which scatter data across memory, can be significantly slower in practice due to "cache misses."
Evaluating
Evaluating a structure: (1) Frequency of operations: Will we be doing more "lookups" or more "insertions"? (2) Memory constraints: Can we afford the extra memory for pointers in a tree or the "empty" space in a hash table? (3) Persistence: Does the data need to be saved to a disk (requiring B-Trees)? (4) Concurrency: Can multiple "threads" access the structure at once without breaking it?
Creating
Future Frontiers: (1) Probabilistic Data Structures: Structures like "Bloom Filters" that use very little memory but can tell you with 99% certainty if an item is in a set. (2) Persistent Data Structures: Structures used in functional programming (like Clojure) that "remember" their previous versions without copying all the data. (3) Succinct Data Structures: Pushing the mathematical limits to store data in the theoretical minimum amount of space (close to the entropy limit). (4) Neural Data Structures: Using machine learning to "learn" the optimal index for a specific dataset.