Data Compression: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
BloomWiki: Data Compression
 
BloomWiki: Data Compression
 
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Data Compression is the process of reducing the size of a digital file by removing "Redundancy" and "Irrelevant Information." It is the reason we can stream movies on our phones, store thousands of photos in our pockets, and send emails across the globe in seconds. There are two main types: '''Lossless''', where the original data is preserved perfectly (like a ZIP file), and '''Lossy''', where we throw away information the human eye or ear can't notice (like a JPEG or MP3). By understanding the mathematical limits of information, we have learned how to "Pack" the entire world of data into smaller and smaller boxes.
Data Compression is the process of reducing the size of a digital file by removing "Redundancy" and "Irrelevant Information." It is the reason we can stream movies on our phones, store thousands of photos in our pockets, and send emails across the globe in seconds. There are two main types: '''Lossless''', where the original data is preserved perfectly (like a ZIP file), and '''Lossy''', where we throw away information the human eye or ear can't notice (like a JPEG or MP3). By understanding the mathematical limits of information, we have learned how to "Pack" the entire world of data into smaller and smaller boxes.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Data Compression''' — Encoding information using fewer bits than the original representation.
* '''Data Compression''' — Encoding information using fewer bits than the original representation.
* '''Lossless Compression''' — Reducing file size while allowing for perfect reconstruction of the original data.
* '''Lossless Compression''' — Reducing file size while allowing for perfect reconstruction of the original data.
Line 13: Line 18:
* '''Dictionary Encoding''' — Replacing long repeating strings with a short "Index" to a dictionary.
* '''Dictionary Encoding''' — Replacing long repeating strings with a short "Index" to a dictionary.
* '''Huffman Coding''' — An algorithm that gives shorter codes to common characters (like 'E') and longer codes to rare ones (like 'Z').
* '''Huffman Coding''' — An algorithm that gives shorter codes to common characters (like 'E') and longer codes to rare ones (like 'Z').
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Data compression is understood through '''Redundancy Elimination''' and '''Perceptual Thresholds'''.
Data compression is understood through '''Redundancy Elimination''' and '''Perceptual Thresholds'''.


Line 30: Line 37:


'''Artifacts''': When you compress something too much (especially with lossy methods), you start to see "Blocks" in a video or "Blur" in a photo. these are called compression artifacts.
'''Artifacts''': When you compress something too much (especially with lossy methods), you start to see "Blocks" in a video or "Blur" in a photo. these are called compression artifacts.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Modeling 'Run-Length Encoding' (A simple lossless algorithm):'''
'''Modeling 'Run-Length Encoding' (A simple lossless algorithm):'''
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
Line 64: Line 73:
: '''The 'MP3' Revolution (1990s)''' → Changed the music industry forever by making songs small enough to "Share" (and pirate) over slow dial-up modems.
: '''The 'MP3' Revolution (1990s)''' → Changed the music industry forever by making songs small enough to "Share" (and pirate) over slow dial-up modems.
: '''H.264 / HEVC''' → The advanced video compression that allows you to watch 4K movies on Netflix without clogging the entire world's internet.
: '''H.264 / HEVC''' → The advanced video compression that allows you to watch 4K movies on Netflix without clogging the entire world's internet.
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Lossless vs. Lossy
|+ Lossless vs. Lossy
Line 80: Line 91:


'''The Concept of "Transcoding"''': Analyzing what happens when you compress an already-compressed file. This is like "Making a photocopy of a photocopy"—each time you do it, the quality drops and "Noise" increases.
'''The Concept of "Transcoding"''': Analyzing what happens when you compress an already-compressed file. This is like "Making a photocopy of a photocopy"—each time you do it, the quality drops and "Noise" increases.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Evaluating data compression:
Evaluating data compression:
# '''Quality vs. Space''': At what point does a file become "Too Small" to enjoy?
# '''Quality vs. Space''': At what point does a file become "Too Small" to enjoy?
Line 87: Line 100:
# '''Archiving''': If we store all of human history in "Lossy" formats, are we losing the "Details" for future generations?
# '''Archiving''': If we store all of human history in "Lossy" formats, are we losing the "Details" for future generations?
# '''Standardization''': What happens if the software to decompress a file disappears? (The "Digital Dark Age").
# '''Standardization''': What happens if the software to decompress a file disappears? (The "Digital Dark Age").
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Future Frontiers:
Future Frontiers:
# '''AI Compression (Neural Codecs)''': Using neural networks to "Generate" a face rather than storing it, allowing for 1,000x smaller video calls.
# '''AI Compression (Neural Codecs)''': Using neural networks to "Generate" a face rather than storing it, allowing for 1,000x smaller video calls.
Line 98: Line 113:
[[Category:Technology]]
[[Category:Technology]]
[[Category:Mathematics]]
[[Category:Mathematics]]
</div>

Latest revision as of 01:49, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Data Compression is the process of reducing the size of a digital file by removing "Redundancy" and "Irrelevant Information." It is the reason we can stream movies on our phones, store thousands of photos in our pockets, and send emails across the globe in seconds. There are two main types: Lossless, where the original data is preserved perfectly (like a ZIP file), and Lossy, where we throw away information the human eye or ear can't notice (like a JPEG or MP3). By understanding the mathematical limits of information, we have learned how to "Pack" the entire world of data into smaller and smaller boxes.

Remembering[edit]

  • Data Compression — Encoding information using fewer bits than the original representation.
  • Lossless Compression — Reducing file size while allowing for perfect reconstruction of the original data.
  • Lossy Compression — Achieving high compression by permanently removing data that is deemed less important (usually based on human perception).
  • Redundancy — Parts of a message that repeat or can be predicted (e.g., "aaaaa" can be compressed to "5a").
  • Algorithm — The set of rules used to compress and decompress data (e.g., LZW, Huffman, DEFLATE).
  • Codec — (Coder-Decoder) The hardware or software that performs the compression.
  • Bitrate — The amount of data processed per unit of time (e.g., 128 kbps for an MP3).
  • Run-Length Encoding (RLE) — A simple compression method that replaces sequences of identical characters with a count and the character.
  • Dictionary Encoding — Replacing long repeating strings with a short "Index" to a dictionary.
  • Huffman Coding — An algorithm that gives shorter codes to common characters (like 'E') and longer codes to rare ones (like 'Z').

Understanding[edit]

Data compression is understood through Redundancy Elimination and Perceptual Thresholds.

1. The Fight Against Redundancy (Lossless): Most data is very repetitive.

  • Pattern Recognition: If a text says "The" 1,000 times, the computer doesn't need to store "T-h-e" 1,000 times. It stores "The" once and gives it a tiny "Shortcut" code.
  • Statistical Probabilities: Huffman coding uses the fact that some symbols happen more than others. By giving the most common ones the shortest codes, the average size of the message drops.

2. The Human Cheat (Lossy): Our eyes and ears are imperfect.

  • JPEG: Your eye is great at seeing brightness but bad at seeing small changes in color. JPEG throws away 90% of the color data and your brain "Fills it in."
  • MP3: Uses "Acoustic Masking." If there is a loud drum and a quiet flute at the same time, you can't hear the flute anyway. MP3 throws the flute data away.

3. The Shannon Limit: No matter how smart your algorithm is, you can never compress a file smaller than its "Entropy" (the pure randomness inside) without losing information.

Artifacts: When you compress something too much (especially with lossy methods), you start to see "Blocks" in a video or "Blur" in a photo. these are called compression artifacts.

Applying[edit]

Modeling 'Run-Length Encoding' (A simple lossless algorithm): <syntaxhighlight lang="python"> def rle_compress(data):

   """
   Compresses 'AAABBC' into '3A2B1C'
   """
   if not data: return ""
   
   compressed = []
   count = 1
   for i in range(1, len(data)):
       if data[i] == data[i-1]:
           count += 1
       else:
           compressed.append(f"{count}{data[i-1]}")
           count = 1
   compressed.append(f"{count}{data[-1]}")
   
   return "".join(compressed)

raw = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" comp = rle_compress(raw) print(f"Original: {len(raw)} chars") print(f"Compressed: {comp} ({len(comp)} chars)") print(f"Efficiency: {round((1 - len(comp)/len(raw))*100, 1)}% reduction") </syntaxhighlight>

Compression Landmarks
The 'LZW' Algorithm (1984) → The basis for the GIF and ZIP formats, which allowed the early internet to handle images.
The 'JPEG' Standard (1992) → The invention that made digital photography possible by shrinking 10MB photos into 1MB files.
The 'MP3' Revolution (1990s) → Changed the music industry forever by making songs small enough to "Share" (and pirate) over slow dial-up modems.
H.264 / HEVC → The advanced video compression that allows you to watch 4K movies on Netflix without clogging the entire world's internet.

Analyzing[edit]

Lossless vs. Lossy
Feature Lossless (ZIP/PNG) Lossy (MP3/JPEG)
Integrity 100% Perfect reconstruction Data is lost forever
File Size Moderate reduction (2x-5x) Massive reduction (10x-100x)
Usage Text, Code, Medical images Photos, Music, Video
Limit The entropy of the data The limit of human perception

The Concept of "Transcoding": Analyzing what happens when you compress an already-compressed file. This is like "Making a photocopy of a photocopy"—each time you do it, the quality drops and "Noise" increases.

Evaluating[edit]

Evaluating data compression:

  1. Quality vs. Space: At what point does a file become "Too Small" to enjoy?
  2. Processing Power: Is it worth saving 1MB of space if the computer has to work 10x harder to decompress it? (This is why phone batteries die faster when playing high-res video).
  3. Archiving: If we store all of human history in "Lossy" formats, are we losing the "Details" for future generations?
  4. Standardization: What happens if the software to decompress a file disappears? (The "Digital Dark Age").

Creating[edit]

Future Frontiers:

  1. AI Compression (Neural Codecs): Using neural networks to "Generate" a face rather than storing it, allowing for 1,000x smaller video calls.
  2. Semantic Compression: A system that only stores "What happened" (e.g., "A dog ran left") and lets your computer recreate the scene locally.
  3. Quantum Compression: Developing ways to compress "Quantum Bits" (Qubits) for the future quantum internet.
  4. Holographic Storage: Using 3D light-patterns to store data at densities 1,000x higher than current hard drives.