R. Craig Collins > Common > How To: Compression
How To: Compression © R. Craig Collins, 2005/6
Zipping reduces file size, called compression. But some files are compressed when saved, especially images.
Again, compression is about reducing file size; smaller files take up less space on hard drives, and download faster. Basically, there are two ways to compress a file, to remove redundant or superfluous data.
Redundant data is repeated data; as computers are very good are detecting and repeating patterns, this is a favorite method of compression.
Example: Let's say we point a digital camera at a white board, and snap a picture.
The camera detects colors at various points, usually by dividing the image to
be captured into rows and columns, then collecting color information at the
intersections. VGA resolution is 640 points along the horizontal, and 480 rows,
for just over 30700 pixels, or picture elements. Below is a magnified view of
a line on that white board, so you may see the pixels that make up the image.
Each pixel is represented by a series of 1s and 0s that dictate the color... some cameras will assign twenty four 1s and 0s to each pixel, allowing up to 16.7 million colors to be represented at that single point. That means you have 640 x 480 x 24, or a total of 7372800 1s and 0s for that one image. That is just over 7 megabytes! Obviously, we need to compress the image, and one way is to get rid of the redundancy. On the rows that are just white pixels, instead of saying 'white pixel,' 'white pixel,' 'white pixel' over and over, why not just tell the computer to repeat the white pixel 640 times. And if you have 400 or 500 similar rows, why not tell the computer to repeat the 'white row' 400 or 500 times? This immediately gets rid of a lot of 1s and 0s, making the file smaller. The file has not changed, just how we describe it. This is called loss-less compression.
Another way to compress a file is to remove 'extra' information.
Example: consider 2+2=4. Is there any part of that statement that does not need to be stored? The computer can add 2+2, so why store the answer? The answer will not change the next time it is added, to the answer is superfluous, or extra.
Now, consider a picture of a rainbow. Can you really detect the 16.7 million shades, or would 256 shades get your point across? Moving from 24 bit color to 8 bit color, thus removing some of the 1s and 0s that indicate color, you can compress a file by removing the 'extra' color information, and a lot of 1s and zeros that make up the file. The file has changed, but perhaps not in a meaningful way. This is called lossy compression.
The image below on the left is a jpg; while still compressed, it is still a good image. (More about jpg later.) However, on the right, that image has been stripped down to 216 colors; now, there
is no smooth transition between shades. This would NOT be a good compression,
as TOO much info was removed.
In images, a .bmp file is lightly compressed, and not good for say Internet downloads as the files stay large. So instead, you could use .gif, if your file is limited to 256 colors, as in a pie chart. If it is an photograph, as above, use .jpg which allows for more colors, but still has good compression.
PS, that 7 MB picture of the white board can compress to 900 KB (24 bit 16.7 million shades bitmap image), or 37 KB (monochrome bitmap image)... or a 5 KB jpg file... but since we don't need a lot of colors, it is best suited to be a gif, which comes in at mere 1.7 KB! From 7 MB to 2 KB without loosing the meaning... that is about 2400% improvement in storage space, and download time.