Join us on Discord!
You can help CodeWalrus stay online by donating here.

Low memory data decompression

Started by rwill, August 21, 2015, 05:14:04 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

rwill

Hello,

just in case someone needs to compress data on a powerful machine and then decompress it fast in a low performance low memory environment I made a deflate ( think zip ) like compression/decompression utility in C. The compression part is heavily unoptimized while the decompression part is using below 1kb of memory plus a window of the previously decoded data. The functions for decompression are also kind of optimized for simplicity, a 8 or 16 bit CPU should be enough. Its compressing 4 bit at a time or copies blocks of 8 bit data from the previously decoded data. The compression ratios are surprisingly good given the constraints I set.

I dont know how to make the source available though as it seems I cannot attach small text files to posts I make.

Source code is here: pastebin.com/6RQ7JnjF

Setting WITH_LITERAL_ONLY_TREE to 0 reduces decompression memory usage further at the cost of compression ratio. The 1kb memory statement assumes that you decompress from ROM, otherwise you need to load the compressed data from a medium into a buffer. Modify as needed and the like...

gbl08ma

Just post the source to some pastebin and put here the link. And don't forget to add a license to your code.
If you have a GitHub account, you can use https://gist.github.com/ as that makes posting changes to your code easy, without having to use git, ensuring people can always find the latest version.

By the way, here's an interesting alternative for people interested in compressing and decompressing data using little memory: https://github.com/atomicobject/heatshrink
  • Calculators owned: Prizm CG-20

Dream of Omnimaga

Heya rwill and welcome to the forums. You can insert source code inside [code][/code]  tags and it will work fine too :)

You can also attach files once you reach 1 post but I think the site requires moderators to approve them until you reach 5 posts, thanks to that one nasty spambot from 2010 O.O on Omni
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

rwill

Yeah, I noticed that I would have been able to attach a file when editing my first post. Quite the loophole this post editing...

@gbl08ma
I checked out heatshrink but compression ratio is not so good ?

Canterbury Corpus:
best heatshrink settings:   ~1036kb
my deflate: ~820kb
gzip: ~680kb

gbl08ma

How did you determine the "best" heatshrink settings? Note that higher numbers do not necessarily mean better compression; you need to check all settings by trial and error to find out the best compression for a given file.

The Utilities add-in for the Casio Prizm uses heatshrink to compress files. When compressing a file, it tests compression with various settings and selects the values that give the best compression. Since it wouldn't be feasible to test every possibility (and it wouldn't make much sense, either, since at "the borders" the compression is always pretty bad), it checks a few pairs of window sizes and look-aheads:

static const unsigned char wsArray[]={14, 14, 12, 14, 8 };
static const unsigned char laArray[]={8,  3,  3,  5, 4 };


So it checks (14, 8), (14, 3), (12, 3), (14, 5), (8, 4) (values are window size, look ahead).
I have chosen these pairs of values by developing a quick and dirty bash script which would compress a file using all the setting possibilities. I then took note of the settings corresponding to the smallest file size for each kind of file, using file types that people are most likely to store on their calculators: plain text / notes / ebooks, bitmap images, BASIC programs, RAM backups, etc.
I used the pairs that most frequently yielded the smallest file size for my test cases. The settings are then stored in the archive header, so that the decompressor knows what to use. It is built so that it can decompress files with settings different from those used by Utilities itself, so that if e.g. a desktop program knows better settings than Utilities for a given file, it can produce a smaller compressed file that is still readable by Utilities.
  • Calculators owned: Prizm CG-20

Dream of Omnimaga

Quote from: rwill on August 21, 2015, 09:05:11 PM
Yeah, I noticed that I would have been able to attach a file when editing my first post. Quite the loophole this post editing
well so far spambots don't know how to edit posts it seems, so it does the job well. :P  We even encourage new members that are actual persons to use the loophole. We just don't want robots/automated accounts to attach pictures of certain male/female body parts on the forums again
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

rwill

@gbl08ma

Interesting... when I try almost all possible combinations heatshrink allows for window and length for back references on each file of the Canterbury Corpus and pick the smallest result for each heatshrink comes out at around 880kb for all 11 files at best. Doing it this way takes some time though. As you already noted, picking the same settings for all files is far from optimal. Sadly I cannot test the same approach with my modified deflate or the official gzip as these lack such tuning parameters. Given the Canterbury Corpus files they come out smaller in all cases though.


Lionel Debroux

#7
Another data point: PuCrunch, which is the de-facto standard compression on the TI-68k series through ttpack / ttppggen, also requires very little additional memory for decompression. The fast version of the decompressor is around 500 bytes, it uses a 256-byte LUT on the stack and can decompress at 60-80 KB/s; the small/slow version of the decompressor is slightly above 200 bytes, doesn't use a LUT, and can decompress at ~20 KB/s. Both decompressors are written in pure assembly, of course. The PuCrunch compression ratio is usually on par with zlib.

There were some experiments with LZMA on the TI-68k/AMS platform. On low enough settings, LZMA can be decompressed with a small amount of additional memory, yet offers a better compression ratio than zlib / PuCrunch.  However, the decompression code, derived from upstream LZMA, was both huge (~2 KB) and extremely slow: ~1-3 KB/s. Having to wait 10-20s for a large program to launch was a complete non-starter, and it was very unlikely that even a highly optimized version of the decompression code could rival the PuCrunch decompressors in terms of either speed or size (which then negated part of the upside of using a more powerful compression, all the more the decompression code is duplicated across all of those pesky program-specific launchers), so LZMA remained an experiment.
Member of the TI-Chess Team.
Co-maintainer of GCC4TI (GCC4TI online documentation), TIEmu and TILP.
Co-admin of TI-Planet.

Powered by EzPortal