Friday, August 9, 2019

Data Compression

Idea of compression? 

Plenty of disk space .. doen't effect most of us.
Wrong impression... compression is used behind the scences in computer systems quite often.

Even your voice is compressed when you speak on the phone: telephone compnay can chieve a vstly superior utiliation fo their resources if they compress voice data before transporting it.

Lossless Compression: The ultimate free lunch

two very different types of compression: lossless and lossy. lossless compression algorithm cant produce dramatic savings on every file. But a good algorithm will produce substantial savings on certain common types of files.

Humans do this all the time without even thinking about it Eg. weekly calenders - each hours for 5 days - 40 hrs, . but you may says somethings like "monday and tuesday are full and I am booked from 1pm to 3pm on Thursday and Friday, but otherwise available"

basic idea is to find parts of the data that are identical to each other and use some kind of trick to describe those parts more efficiently.

AAAAAAAAAAAAAAAAAAAAABCBCBCBCBCBCBCBCBCBCAAAAAADEFDEFDEF

I may write like 21A, 10BC, 6A, 3DEF. In this case 56 characters is compressed to only 16. less than one-third. this is call Run Length Encoding. It encodes a "run" of repetitions with the "length" of that run.

Run Length Encoding

RLE is only useful for compression of specific types of data. It is used mostly in combination with other algorithms like Huffman coding. Problem with RLE is repeatitions in the data has to be adjacent. on other data in the repeated parts.. eg. ABXABYAB

Fax machines can take advantage of RLE as  faxes are black and white documents  which get converted to larger number of dots with each dots being either black or white.

Same-as-Earlier Trick

DJKSEMSIT-SM-DJKSEMSIT-ALXDF-O-DJKSEMSIT-ALXDF-DJKSEMSIT-EW-ALXDF

"-" are just put in so that we can read.. in actual sequece there is no dash.

56 characters..

think of go back to and copy.. b for back, c for copy.. b10c6 means go back 10 and copy 6

DJKSEMSIT-SM-b11c9-AlXDF-O-b16c15-b14c9-EW-b16c5

40 characters

There is one more interesting twist. FG-FG-FG-FG-FG-FG-FG-FG. 8-repretitions. FG-FG-FG-FG-b8c8 or FG-b2c14.

Shorter-Symbol Trick

Computers stored everything as a number and interpreted as a letter according to some fixed table. "a" is represented by 27, b is 28, c is 29. Then abc is stored as 272829. A is 01 and Z is 26, "a" is 27 and "z" is 52", a stroke is 80 and U stroke is 99.  "Meet your fiance there." will be 1331314600514147440032352740298200463431443166

so this message is 46 digits long.

This trick is something Humans do all the time without even thinking about it. USA - United State of America - we use 3 letter codes for a the full 24 letter phrase it stands for. How about "The sky is blue in color"? The key difference is that one of these phrases is used much more often than the other. we can save a lot more by abbreviating a frequently used phrase instead of one that is rarely used.

e and t are commonly used, let cut them down to single digit.

M eet  y o u r   f i a n c e   th ere.
1388900514147440032352740298200934844866

There is a  fatal flaw, as computer doesn't store spaces between individual letters. Let see first 5 digit 13889, it can be interpreted as 13-8-8-9 or 13-88-9 or 13-8-89.

There is no way to tell one digit code or two digit code. We have to make some sacrifice: some of our codes will be actually get longer. The ambiguoug two-digit codes that start with 8 or 9 will become three-digit codes that do not start with 8 or 9. Anything start with a 7 is a three digit code, any starting with an 8 or 9 is a single digit code. anything starting with 0,1,2,3,4,5 or 6 is the same two-digit code.

M eet  y o u r   f i a n c e    th ere. 13889005141474400323527402978200934844866

46 digits to only 41.

ZIP
Step 1 same as earlier trick, back and copy
Step 2 most common symbols and construct table
Step 3 transform again by translating into numeric codes defined in step 2

the table is stored together with the file, becomes a ZIP file. In a real ZIP file the original file is broken up into chunks and each chunk can have a different numeric code table.

Lossy Compression

For image or audio, as long as it look the same and sound the same, it doesn't really matter whether the file storing is exactly the same as the file stores that song on a compact disc. Sometimes lossy compression is used in much more extreme way such as compression is being used aggressively to ake the fiale size of the video or images very small resulting in low-quality video and images.

Leave-it-out Trick

Simple and useful trick. first understand how black and white picture are stored, grey scale and pixels and resolution.For 320 x 240 picture, number of pixels rather large, 320*240 = 76,800. Extremely simple technique: ignore, or leave out every second row or second column of pixels. It results in picture of 160 x 120. The size will be only one-quarter of the original. Reduce again and the size will be only 6% of original size.

But when the picture is decompress back to original size, computer has to guess the colors of missing pixels. Most of the visual feature may be retained but there is some definite loss of quality and details, especially in complex areas.

But computer do indeed leave out information to achieve lossy compression, but they are much more careful. JPEG strategy for  is complicated. But basic flavor is fairly straightforward. JPEG divides the whole image into small squares of 8x8 pixels. Each is compressed separately.  If the square happens to be all one color, the entire square is represented by a single number. If the square is mostly the same color, with a few very slight differences, computer can decide to represent the square by a single number, with only a small error. JPEG works well with picture. MP3 and AAC generally use the same high-level approach as JPEG.

Origin of Compression Algorithms

The same-as-earlier trick is the main compression methods used in ZIP file and it is known as LZ77 algorithm. It was invented by two Israeli computer scinetists, Abraham Lempel and Jacob Ziv, and published in 1977.

Claude Shannon the bell labs scientist who founded the fiedl of information theory withhis 1958 paper. Not only error correcting codes, but also figure importantly in the emergence of compression algorithms.

error-correcting codes and compression algorithms are two sides of the same coin. Error-correcting codes can be viewed as a principled way of adding redundancy to a message or file.
Compression algorithms do the oppostie: they remove redundancy from a messge of file.

Shannon, in his seminal 1948 paper, included a description of one of the earlier compression techniques, Shannon-Fano encoding. It is a particular way of implemneting the shorter-symbol trick. The Robert Fano's student solved the problem for achieving optimal compression for each individual symbol. The student was David Huffman, and his technique-now known as Huffman coding is another example of shorter symbol trick.


Saturday, December 31, 2011

7 segment LED by scanning
First the scanning is done with turn on time around 15 cycles @ 8MHz and off time around 2 cycles. When the scanning is done so fast different digit of 7segment display appeared messed up, even with the correct voltage (waveform) at the pin. This may be due to the delay caused by the current limiting resistor to each segment or may be due to the turn-off delay of the transistors.

The scanning timing is changed to 10 msec/ digit and the display are quite clear and bright.
Voltage Reference using TL431
TL 431 requires 1 mA current. When the current to the TL431 is less than 1ma due to the current drawn by load resistors, the reference voltage will become less than 2.5V.

Sunday, June 12, 2011

Accessing Blocked Websites

With the ease of configuration of Opera Unite Webproxy and ease of Wampserver, i will make use of my home computer as a webproxy.

Apache is only use from wampserver installation. MySQL, PHP are for my later development. CGI is implemented with active perl as in the following link.
http://www.webstuffscan.com/2006/12/21/accessing-blocked-websites-use-your-own-proxy-server-at-home/
The directory of wamp server is c:\wamp\www. Therefore, I reside the cgi script in that directory rather than htdocs.

It is ok to get through. But Opera Unite is not an SSL proxy. Therefore, I cannot get to facebook.

Google Sketch Up

After trying sketch up, i have to accept that this is the best tool i have ever used in sketching 3D. Except from the fact that a mouse is essential, ease of use weights way better than those CAD software I have been using in drawing. I will also try out other sketching tools such as BLENDER also since my experience is mostly on CAD rather than sketching.

Following video shows how efficient and easy to draw a chair with sketch up (We can also enter length, height, etc. precisely).

GAME 3D MODELS

3D models are available for purchase for the lazy and more of programmer rather than artists to develop 3D games. Models can be purchased here:
http://www.3drt.com/3dm/3dm.htm
http://www.game-stuff.com/
http://3dbud.com/
http://www.frogames.net/content-packs.html
http://dexsoft-games.com/
http://www.arteria3d.com/
http://www.garagegames.com/products/browse/artpacks
http://www.exchange3d.com/3D_catalog_game%20ready.html
http://www.turbosquid.com/
http://www.scifi-meshes.com/forums/
Google it. There are so many . . .

Followers