Distributed Systems: Difference between revisions
BloomWiki: Distributed Systems |
BloomWiki: Distributed Systems |
||
| Line 1: | Line 1: | ||
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | |||
{{BloomIntro}} | {{BloomIntro}} | ||
Distributed Systems are the "Invisible Engines" that power the modern internet. While a traditional computer is a single box on your desk, a distributed system is a "Team" of hundreds or thousands of computers (Nodes) working together to solve a single problem. Whether you are searching Google, streaming Netflix, or sending a Bitcoin transaction, you are interacting with a distributed system. It is the science of "Scaling Up"—creating a machine that is so large and so resilient that it can serve millions of people at once and keep running even if dozens of its parts fail. | Distributed Systems are the "Invisible Engines" that power the modern internet. While a traditional computer is a single box on your desk, a distributed system is a "Team" of hundreds or thousands of computers (Nodes) working together to solve a single problem. Whether you are searching Google, streaming Netflix, or sending a Bitcoin transaction, you are interacting with a distributed system. It is the science of "Scaling Up"—creating a machine that is so large and so resilient that it can serve millions of people at once and keep running even if dozens of its parts fail. | ||
</div> | |||
== Remembering == | __TOC__ | ||
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | |||
== <span style="color: #FFFFFF;">Remembering</span> == | |||
* '''Distributed System''' — A collection of independent computers that appears to its users as a single coherent system. | * '''Distributed System''' — A collection of independent computers that appears to its users as a single coherent system. | ||
* '''Node''' — A single computer or process within a distributed system. | * '''Node''' — A single computer or process within a distributed system. | ||
| Line 13: | Line 18: | ||
* '''Load Balancer''' — A device that acts as a "Traffic Cop," distributing incoming requests to different servers so no one server is overwhelmed. | * '''Load Balancer''' — A device that acts as a "Traffic Cop," distributing incoming requests to different servers so no one server is overwhelmed. | ||
* '''CAP Theorem''' — The fundamental rule that a distributed system can only provide two out of three: Consistency, Availability, and Partition Tolerance. | * '''CAP Theorem''' — The fundamental rule that a distributed system can only provide two out of three: Consistency, Availability, and Partition Tolerance. | ||
</div> | |||
== Understanding == | <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Understanding</span> == | |||
Distributed systems are understood through '''Coordination''' and '''Trade-offs'''. | Distributed systems are understood through '''Coordination''' and '''Trade-offs'''. | ||
| Line 36: | Line 43: | ||
'''Eventual Consistency''': A compromise used by systems like Amazon or Facebook. If you update your profile, your friend in Australia might not see it for 1 second. The data is "Eventually" consistent, allowing the system to stay incredibly fast. | '''Eventual Consistency''': A compromise used by systems like Amazon or Facebook. If you update your profile, your friend in Australia might not see it for 1 second. The data is "Eventually" consistent, allowing the system to stay incredibly fast. | ||
</div> | |||
== Applying == | <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Applying</span> == | |||
'''Modeling 'The Consensus Problem' (Agreement in a noisy room):''' | '''Modeling 'The Consensus Problem' (Agreement in a noisy room):''' | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
| Line 69: | Line 78: | ||
: '''Netflix Chaos Monkey''' → A tool used by Netflix that "Randomly Kills" their own servers in production to force their developers to build systems that never crash. | : '''Netflix Chaos Monkey''' → A tool used by Netflix that "Randomly Kills" their own servers in production to force their developers to build systems that never crash. | ||
: '''The Fall of 1996 (AOL)''' → A famous 19-hour outage that showed the world how fragile distributed systems can be if they aren't designed for scale. | : '''The Fall of 1996 (AOL)''' → A famous 19-hour outage that showed the world how fragile distributed systems can be if they aren't designed for scale. | ||
</div> | |||
== Analyzing == | <div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Analyzing</span> == | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ Centralized vs. Distributed | |+ Centralized vs. Distributed | ||
| Line 85: | Line 96: | ||
'''The Concept of "Sharding"''': Analyzing how to store a database that is too big for one disk. You "Slice" the data (e.g., users A-M on Server 1, N-Z on Server 2). This allows you to store petabytes of data, but makes "Searching" much more complex. | '''The Concept of "Sharding"''': Analyzing how to store a database that is too big for one disk. You "Slice" the data (e.g., users A-M on Server 1, N-Z on Server 2). This allows you to store petabytes of data, but makes "Searching" much more complex. | ||
</div> | |||
== Evaluating == | <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Evaluating</span> == | |||
Evaluating distributed systems: | Evaluating distributed systems: | ||
# '''Complexity''': Is the "Overhead" of managing 1,000 servers worth it for a small company? (The "Monolith vs. Microservices" debate). | # '''Complexity''': Is the "Overhead" of managing 1,000 servers worth it for a small company? (The "Monolith vs. Microservices" debate). | ||
| Line 92: | Line 105: | ||
# '''Environment''': Do massive data centers use too much electricity? (Some distributed systems now "Follow the Sun"—moving their work to whichever part of the world has the most solar power right now). | # '''Environment''': Do massive data centers use too much electricity? (Some distributed systems now "Follow the Sun"—moving their work to whichever part of the world has the most solar power right now). | ||
# '''Trust''': In a "Public" distributed system (like Blockchain), how do you prevent "Byzantine" nodes from lying to the group? | # '''Trust''': In a "Public" distributed system (like Blockchain), how do you prevent "Byzantine" nodes from lying to the group? | ||
</div> | |||
== Creating == | <div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> | ||
== <span style="color: #FFFFFF;">Creating</span> == | |||
Future Frontiers: | Future Frontiers: | ||
# '''Edge Computing''': Moving the "Brain" of the internet out of data centers and into your fridge, your car, and your watch to reduce latency to zero. | # '''Edge Computing''': Moving the "Brain" of the internet out of data centers and into your fridge, your car, and your watch to reduce latency to zero. | ||
| Line 103: | Line 118: | ||
[[Category:Software Engineering]] | [[Category:Software Engineering]] | ||
[[Category:Technology]] | [[Category:Technology]] | ||
</div> | |||
Latest revision as of 01:50, 25 April 2026
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Distributed Systems are the "Invisible Engines" that power the modern internet. While a traditional computer is a single box on your desk, a distributed system is a "Team" of hundreds or thousands of computers (Nodes) working together to solve a single problem. Whether you are searching Google, streaming Netflix, or sending a Bitcoin transaction, you are interacting with a distributed system. It is the science of "Scaling Up"—creating a machine that is so large and so resilient that it can serve millions of people at once and keep running even if dozens of its parts fail.
Remembering[edit]
- Distributed System — A collection of independent computers that appears to its users as a single coherent system.
- Node — A single computer or process within a distributed system.
- Scalability — The ability of a system to handle more work by adding more resources (e.g., adding more servers).
- Availability — The percentage of time a system is "Up" and responding to requests (e.g., "Five Nines" = 99.999% uptime).
- Fault Tolerance — The ability of a system to continue operating even when one or more components fail.
- Consensus — The process by which all nodes in a system agree on a single value or state (e.g., the Paxos or Raft algorithms).
- Latency — The "Delay" or time it takes for a message to travel from one node to another.
- Throughput — The amount of work (requests) a system can handle per second.
- Load Balancer — A device that acts as a "Traffic Cop," distributing incoming requests to different servers so no one server is overwhelmed.
- CAP Theorem — The fundamental rule that a distributed system can only provide two out of three: Consistency, Availability, and Partition Tolerance.
Understanding[edit]
Distributed systems are understood through Coordination and Trade-offs.
1. The CAP Theorem (The "Choose Two" Rule): This is the most important law in distributed systems.
- Consistency (C): Every node sees the same data at the same time.
- Availability (A): Every request gets a response.
- Partition Tolerance (P): The system keeps working even if the network between nodes breaks.
- Because the network *always* breaks eventually (P), you have to choose between **C** and **A**. Do you want the answer to be "Perfectly Correct" (C) or "Fast and Always There" (A)?
2. Horizontal vs. Vertical Scaling:
- Vertical (Scale Up): Buying a "Bigger" computer. (Expensive and has a limit).
- Horizontal (Scale Out): Buying "More" cheap computers. (This is how modern systems like Google and AWS work).
3. The Fallacies of Distributed Computing: New developers often make dangerous assumptions that cause systems to fail:
- They assume the network is "Always Reliable."
- They assume "Latency is Zero."
- They assume "Bandwidth is Infinite."
- In a distributed system, you must "Design for Failure." You assume everything will break and build the software to handle it.
Eventual Consistency: A compromise used by systems like Amazon or Facebook. If you update your profile, your friend in Australia might not see it for 1 second. The data is "Eventually" consistent, allowing the system to stay incredibly fast.
Applying[edit]
Modeling 'The Consensus Problem' (Agreement in a noisy room): <syntaxhighlight lang="python"> import random
def reach_consensus(nodes, value):
"""
Simplified 'Majority Rule' consensus.
"""
votes = []
for node in nodes:
# 10% chance a node is 'Down'
if random.random() > 0.1:
votes.append(value)
else:
votes.append("ERROR")
# If more than 50% agree, we have a state!
if votes.count(value) > (len(nodes) / 2):
return f"SUCCESS: System agreed on {value} with {votes.count(value)} votes."
else:
return "FAILURE: System could not reach consensus (Too many nodes down)."
- A cluster of 5 servers trying to save a database entry
print(reach_consensus(["S1", "S2", "S3", "S4", "S5"], "User_Balance=100")) </syntaxhighlight>
- Distributed Landmarks
- The 'Google File System' (GFS) Paper (2003) → The document that changed the world by showing how to build a massive storage system using thousands of "Cheap, failing computers" instead of one expensive mainframe.
- Bitcoin (2008) → The first truly "Decentralized" system that solved the consensus problem without needing a central boss (using Proof of Work).
- Netflix Chaos Monkey → A tool used by Netflix that "Randomly Kills" their own servers in production to force their developers to build systems that never crash.
- The Fall of 1996 (AOL) → A famous 19-hour outage that showed the world how fragile distributed systems can be if they aren't designed for scale.
Analyzing[edit]
| Feature | Centralized (Mainframe) | Distributed (Cloud) |
|---|---|---|
| Point of Failure | Single (If it dies, all die) | Multiple (If one dies, others live) |
| Cost | High (Special hardware) | Low (Commodity hardware) |
| Complexity | Low (Easy to manage) | High (Hard to sync) |
| Capacity | Limited by one box | Effectively Infinite |
The Concept of "Sharding": Analyzing how to store a database that is too big for one disk. You "Slice" the data (e.g., users A-M on Server 1, N-Z on Server 2). This allows you to store petabytes of data, but makes "Searching" much more complex.
Evaluating[edit]
Evaluating distributed systems:
- Complexity: Is the "Overhead" of managing 1,000 servers worth it for a small company? (The "Monolith vs. Microservices" debate).
- Debugging: How do you find a bug that only happens when Server A talks to Server B while Server C is slow? (The "Heisenbug" problem).
- Environment: Do massive data centers use too much electricity? (Some distributed systems now "Follow the Sun"—moving their work to whichever part of the world has the most solar power right now).
- Trust: In a "Public" distributed system (like Blockchain), how do you prevent "Byzantine" nodes from lying to the group?
Creating[edit]
Future Frontiers:
- Edge Computing: Moving the "Brain" of the internet out of data centers and into your fridge, your car, and your watch to reduce latency to zero.
- Quantum Networking: Using "Entanglement" to create distributed systems that can sync data instantly across the galaxy.
- Self-Healing Systems: AI that "Watches" a system and automatically spawns new nodes or rewrites code when it detects a bottleneck.
- Interplanetary Internet: Designing distributed systems that can handle the "20-minute latency" between Earth and Mars.