Editing
Data Compression
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> {{BloomIntro}} Data Compression is the process of reducing the size of a digital file by removing "Redundancy" and "Irrelevant Information." It is the reason we can stream movies on our phones, store thousands of photos in our pockets, and send emails across the globe in seconds. There are two main types: '''Lossless''', where the original data is preserved perfectly (like a ZIP file), and '''Lossy''', where we throw away information the human eye or ear can't notice (like a JPEG or MP3). By understanding the mathematical limits of information, we have learned how to "Pack" the entire world of data into smaller and smaller boxes. </div> __TOC__ <div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Remembering</span> == * '''Data Compression''' β Encoding information using fewer bits than the original representation. * '''Lossless Compression''' β Reducing file size while allowing for perfect reconstruction of the original data. * '''Lossy Compression''' β Achieving high compression by permanently removing data that is deemed less important (usually based on human perception). * '''Redundancy''' β Parts of a message that repeat or can be predicted (e.g., "aaaaa" can be compressed to "5a"). * '''Algorithm''' β The set of rules used to compress and decompress data (e.g., LZW, Huffman, DEFLATE). * '''Codec''' β (Coder-Decoder) The hardware or software that performs the compression. * '''Bitrate''' β The amount of data processed per unit of time (e.g., 128 kbps for an MP3). * '''Run-Length Encoding (RLE)''' β A simple compression method that replaces sequences of identical characters with a count and the character. * '''Dictionary Encoding''' β Replacing long repeating strings with a short "Index" to a dictionary. * '''Huffman Coding''' β An algorithm that gives shorter codes to common characters (like 'E') and longer codes to rare ones (like 'Z'). </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Understanding</span> == Data compression is understood through '''Redundancy Elimination''' and '''Perceptual Thresholds'''. '''1. The Fight Against Redundancy (Lossless)''': Most data is very repetitive. * '''Pattern Recognition''': If a text says "The" 1,000 times, the computer doesn't need to store "T-h-e" 1,000 times. It stores "The" once and gives it a tiny "Shortcut" code. * '''Statistical Probabilities''': Huffman coding uses the fact that some symbols happen more than others. By giving the most common ones the shortest codes, the average size of the message drops. '''2. The Human Cheat (Lossy)''': Our eyes and ears are imperfect. * '''JPEG''': Your eye is great at seeing brightness but bad at seeing small changes in color. JPEG throws away 90% of the color data and your brain "Fills it in." * '''MP3''': Uses "Acoustic Masking." If there is a loud drum and a quiet flute at the same time, you can't hear the flute anyway. MP3 throws the flute data away. '''3. The Shannon Limit''': No matter how smart your algorithm is, you can never compress a file smaller than its "Entropy" (the pure randomness inside) without losing information. '''Artifacts''': When you compress something too much (especially with lossy methods), you start to see "Blocks" in a video or "Blur" in a photo. these are called compression artifacts. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Applying</span> == '''Modeling 'Run-Length Encoding' (A simple lossless algorithm):''' <syntaxhighlight lang="python"> def rle_compress(data): """ Compresses 'AAABBC' into '3A2B1C' """ if not data: return "" compressed = [] count = 1 for i in range(1, len(data)): if data[i] == data[i-1]: count += 1 else: compressed.append(f"{count}{data[i-1]}") count = 1 compressed.append(f"{count}{data[-1]}") return "".join(compressed) raw = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" comp = rle_compress(raw) print(f"Original: {len(raw)} chars") print(f"Compressed: {comp} ({len(comp)} chars)") print(f"Efficiency: {round((1 - len(comp)/len(raw))*100, 1)}% reduction") </syntaxhighlight> ; Compression Landmarks : '''The 'LZW' Algorithm (1984)''' β The basis for the GIF and ZIP formats, which allowed the early internet to handle images. : '''The 'JPEG' Standard (1992)''' β The invention that made digital photography possible by shrinking 10MB photos into 1MB files. : '''The 'MP3' Revolution (1990s)''' β Changed the music industry forever by making songs small enough to "Share" (and pirate) over slow dial-up modems. : '''H.264 / HEVC''' β The advanced video compression that allows you to watch 4K movies on Netflix without clogging the entire world's internet. </div> <div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Analyzing</span> == {| class="wikitable" |+ Lossless vs. Lossy ! Feature !! Lossless (ZIP/PNG) !! Lossy (MP3/JPEG) |- | Integrity || 100% Perfect reconstruction || Data is lost forever |- | File Size || Moderate reduction (2x-5x) || Massive reduction (10x-100x) |- | Usage || Text, Code, Medical images || Photos, Music, Video |- | Limit || The entropy of the data || The limit of human perception |} '''The Concept of "Transcoding"''': Analyzing what happens when you compress an already-compressed file. This is like "Making a photocopy of a photocopy"βeach time you do it, the quality drops and "Noise" increases. </div> <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Evaluating</span> == Evaluating data compression: # '''Quality vs. Space''': At what point does a file become "Too Small" to enjoy? # '''Processing Power''': Is it worth saving 1MB of space if the computer has to work 10x harder to decompress it? (This is why phone batteries die faster when playing high-res video). # '''Archiving''': If we store all of human history in "Lossy" formats, are we losing the "Details" for future generations? # '''Standardization''': What happens if the software to decompress a file disappears? (The "Digital Dark Age"). </div> <div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Creating</span> == Future Frontiers: # '''AI Compression (Neural Codecs)''': Using neural networks to "Generate" a face rather than storing it, allowing for 1,000x smaller video calls. # '''Semantic Compression''': A system that only stores "What happened" (e.g., "A dog ran left") and lets your computer recreate the scene locally. # '''Quantum Compression''': Developing ways to compress "Quantum Bits" (Qubits) for the future quantum internet. # '''Holographic Storage''': Using 3D light-patterns to store data at densities 1,000x higher than current hard drives. [[Category:Computer Science]] [[Category:Technology]] [[Category:Mathematics]] </div>
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:BloomIntro
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information