CodeWalrus

Development => PC, Mac & Vintage Computers => Topic started by: rwill on August 21, 2015, 05:14:04 PM

Title: Low memory data decompression
Post by: rwill on August 21, 2015, 05:14:04 PM
Hello,

just in case someone needs to compress data on a powerful machine and then decompress it fast in a low performance low memory environment I made a deflate ( think zip ) like compression/decompression utility in C. The compression part is heavily unoptimized while the decompression part is using below 1kb of memory plus a window of the previously decoded data. The functions for decompression are also kind of optimized for simplicity, a 8 or 16 bit CPU should be enough. Its compressing 4 bit at a time or copies blocks of 8 bit data from the previously decoded data. The compression ratios are surprisingly good given the constraints I set.

I dont know how to make the source available though as it seems I cannot attach small text files to posts I make.

Source code is here: pastebin.com/6RQ7JnjF (http://pastebin.com/6RQ7JnjF)

Setting WITH_LITERAL_ONLY_TREE to 0 reduces decompression memory usage further at the cost of compression ratio. The 1kb memory statement assumes that you decompress from ROM, otherwise you need to load the compressed data from a medium into a buffer. Modify as needed and the like...
Title: Re: Low memory data decompression
Post by: gbl08ma on August 21, 2015, 05:45:14 PM
Just post the source to some pastebin and put here the link. And don't forget to add a license to your code.
If you have a GitHub account, you can use https://gist.github.com/ as that makes posting changes to your code easy, without having to use git, ensuring people can always find the latest version.

By the way, here's an interesting alternative for people interested in compressing and decompressing data using little memory: https://github.com/atomicobject/heatshrink
Title: Re: Low memory data decompression
Post by: Dream of Omnimaga on August 21, 2015, 07:09:36 PM
Heya rwill and welcome to the forums. You can insert source code inside [code][/code]  tags and it will work fine too :)

You can also attach files once you reach 1 post but I think the site requires moderators to approve them until you reach 5 posts, thanks to that one nasty spambot from 2010 O.O on Omni
Title: Re: Low memory data decompression
Post by: rwill on August 21, 2015, 09:05:11 PM
Yeah, I noticed that I would have been able to attach a file when editing my first post. Quite the loophole this post editing...

@gbl08ma
I checked out heatshrink but compression ratio is not so good ?

Canterbury Corpus:
best heatshrink settings:   ~1036kb
my deflate: ~820kb
gzip: ~680kb
Title: Re: Low memory data decompression
Post by: gbl08ma on August 21, 2015, 09:41:37 PM
How did you determine the "best" heatshrink settings? Note that higher numbers do not necessarily mean better compression; you need to check all settings by trial and error to find out the best compression for a given file.

The Utilities add-in for the Casio Prizm uses heatshrink to compress files. When compressing a file, it tests compression with various settings and selects the values that give the best compression. Since it wouldn't be feasible to test every possibility (and it wouldn't make much sense, either, since at "the borders" the compression is always pretty bad), it checks a few pairs of window sizes and look-aheads:

static const unsigned char wsArray[]={14, 14, 12, 14, 8 };
static const unsigned char laArray[]={8,  3,  3,  5, 4 };


So it checks (14, 8), (14, 3), (12, 3), (14, 5), (8, 4) (values are window size, look ahead).
I have chosen these pairs of values by developing a quick and dirty bash script which would compress a file using all the setting possibilities. I then took note of the settings corresponding to the smallest file size for each kind of file, using file types that people are most likely to store on their calculators: plain text / notes / ebooks, bitmap images, BASIC programs, RAM backups, etc.
I used the pairs that most frequently yielded the smallest file size for my test cases. The settings are then stored in the archive header, so that the decompressor knows what to use. It is built so that it can decompress files with settings different from those used by Utilities itself, so that if e.g. a desktop program knows better settings than Utilities for a given file, it can produce a smaller compressed file that is still readable by Utilities.
Title: Re: Low memory data decompression
Post by: Dream of Omnimaga on August 21, 2015, 09:55:47 PM
Quote from: rwill on August 21, 2015, 09:05:11 PM
Yeah, I noticed that I would have been able to attach a file when editing my first post. Quite the loophole this post editing
well so far spambots don't know how to edit posts it seems, so it does the job well. :P  We even encourage new members that are actual persons to use the loophole. We just don't want robots/automated accounts to attach pictures of certain male/female body parts on the forums again
Title: Re: Low memory data decompression
Post by: rwill on August 22, 2015, 08:07:38 AM
@gbl08ma

Interesting... when I try almost all possible combinations heatshrink allows for window and length for back references on each file of the Canterbury Corpus and pick the smallest result for each heatshrink comes out at around 880kb for all 11 files at best. Doing it this way takes some time though. As you already noted, picking the same settings for all files is far from optimal. Sadly I cannot test the same approach with my modified deflate or the official gzip as these lack such tuning parameters. Given the Canterbury Corpus files they come out smaller in all cases though.

Title: Re: Low memory data decompression
Post by: Lionel Debroux on August 28, 2015, 12:34:35 PM
Another data point: PuCrunch, which is the de-facto standard compression on the TI-68k series through ttpack / ttppggen, also requires very little additional memory for decompression. The fast version of the decompressor is around 500 bytes, it uses a 256-byte LUT on the stack and can decompress at 60-80 KB/s; the small/slow version of the decompressor is slightly above 200 bytes, doesn't use a LUT, and can decompress at ~20 KB/s. Both decompressors are written in pure assembly, of course. The PuCrunch compression ratio is usually on par with zlib.

There were some experiments with LZMA on the TI-68k/AMS platform. On low enough settings, LZMA can be decompressed with a small amount of additional memory, yet offers a better compression ratio than zlib / PuCrunch.  However, the decompression code, derived from upstream LZMA, was both huge (~2 KB) and extremely slow: ~1-3 KB/s. Having to wait 10-20s for a large program to launch was a complete non-starter, and it was very unlikely that even a highly optimized version of the decompression code could rival the PuCrunch decompressors in terms of either speed or size (which then negated part of the upside of using a more powerful compression, all the more the decompression code is duplicated across all of those pesky program-specific launchers), so LZMA remained an experiment.