超快的压缩库:density
density是超快的压缩库。它是专注于高速压缩,并尽可能保持最佳比例。density设有缓冲和数据流API,以便快速集成至任何项目。
Benchmarks
Quick bench
File used : enwik8 (100 MB)
Platform : MacBook Pro, OSX 10.10.2, 2.3 GHz Intel Core i7, 8Gb 1600 MHz DDR, SSD
Timing : using the time function, and taking the best user output after multiple runs
Note : sharc -c1 uses the chameleon algorithm, sharc -c2 the cheetah algorithm, and sharc -c3 the lion algorithm (see About the algorithms further down).
Program | Library | Compress | Decompress | Size | Ratio | Round trip |
---|---|---|---|---|---|---|
sharc -c1 | density 0.12.1 | 0.109s (914 MB/s) | 0.077s (1299 MB/s) | 61 524 502 | 61,52% | 0.186s |
lz4 -1 | lz4 r126 | 0.461s (217 MB/s) | 0.091s (1099 MB/s) | 56 995 497 | 57,00% | 0.552s |
lzop -1 | lzo 2.08 | 0.367s (272 MB/s) | 0.309s (324 MB/s) | 56 709 096 | 56,71% | 0.676s |
sharc -c2 | density 0.12.1 | 0.211s (473 MB/s) | 0.177s (565 MB/s) | 53 156 782 | 53,16% | 0.388s |
sharc -c3 | density 0.12.1 | 0.361s (277 MB/s) | 0.396s (253 MB/s) | 47 991 605 | 47,99% | 0.757s |
lz4 -3 | lz4 r126 | 1.520s (66 MB/s) | 0.087s (1149 MB/s) | 47 082 421 | 47,08% | 1.607s |
lzop -7 | lzo 2.08 | 9.562s (10 MB/s) | 0.319s (313 MB/s) | 41 720 721 | 41,72% | 9.881s |
Squash
Squash is an abstraction layer for compression algorithms, and has an extremely exhaustive set of benchmark results, including density's, available here. You can choose between system architecture and compressed file type. There are even ARM boards tested ! A great tool for selecting a compression library.
FsBench
FsBench is a command line utility that enables real-time testing of compression algorithms, but also hashes and much more. A fork with the latest density releases is available here for easy access. The original author's repository can be found here. Very informative tool as well.
Here are the results of a couple of test runs on a MacBook Pro, OSX 10.10.2, 2.3 GHz Intel Core i7, 8Gb 1600 MHz DDR, SSD :
enwik8 (100,000,000 bytes)
Codec version args C.Size (C.Ratio) E.Speed D.Speed E.Eff. D.Eff. density::chameleon 2015-03-27 61524474 (x 1.625) 926 MB/s 1372 MB/s 356e6 527e6 density::cheetah 2015-03-27 53156746 (x 1.881) 470 MB/s 559 MB/s 220e6 261e6 density::lion 2015-03-27 47991569 (x 2.084) 285 MB/s 271 MB/s 146e6 140e6 LZ4 r127 56973103 (x 1.755) 255 MB/s 1604 MB/s 109e6 690e6 LZF 3.6 very 53945381 (x 1.854) 191 MB/s 367 MB/s 88e6 168e6 LZO 2.08 1x1 55792795 (x 1.792) 289 MB/s 384 MB/s 127e6 169e6 QuickLZ 1.5.1b6 1 52334371 (x 1.911) 278 MB/s 347 MB/s 132e6 165e6 Snappy 1.1.0 56539845 (x 1.769) 245 MB/s 801 MB/s 106e6 348e6 wfLZ r10 63521804 (x 1.574) 149 MB/s 514 MB/s 54e6 187e6
silesia (211,960,320 bytes)
Codec version args C.Size (C.Ratio) E.Speed D.Speed E.Eff. D.Eff. density::chameleon 2015-03-27 133118910 (x 1.592) 1046 MB/s 1380 MB/s 388e6 513e6 density::cheetah 2015-03-27 101751474 (x 2.083) 529 MB/s 660 MB/s 275e6 342e6 density::lion 2015-03-27 89433997 (x 2.370) 304 MB/s 275 MB/s 169e6 158e6 LZ4 r127 101634462 (x 2.086) 363 MB/s 1808 MB/s 188e6 941e6 LZF 3.6 very 102043866 (x 2.077) 254 MB/s 485 MB/s 131e6 251e6 LZO 2.08 1x1 100592662 (x 2.107) 433 MB/s 603 MB/s 227e6 317e6 QuickLZ 1.5.1b6 1 94727961 (x 2.238) 368 MB/s 424 MB/s 203e6 234e6 Snappy 1.1.0 101385885 (x 2.091) 359 MB/s 1093 MB/s 187e6 570e6 wfLZ r10 109610020 (x 1.934) 195 MB/s 701 MB/s 94e6 338e6
Build
DENSITY is fully C99 compliant and can therefore be built on a number of platforms. You need a C compiler (gcc, clang ...), and a make utility.
Just cd into the density directory, then run the following command :
make
And that's it !
Output format
DENSITY outputs compressed data in a simple format, which enables file storage and optional parallelization for both compression and decompression. Inside the main header and footer, a number of blocks can be found, each having its own header and footer. Inside each block, compressed data has a structure determined by the compression algorithm used.
It is possible to add an integrity checksum to the compressed output by using the DENSITY_BLOCK_TYPE_WITH_HASHSUM_INTEGRITY_CHECK block type. The 128-bit checksum is calculated using the excellent SpookyHash algorithm, which is extremely fast and offers a near-zero performance penalty. An additional integrity check will then be automatically performed during decompression.
APIs
DENSITY features a buffer API and a stream API which are very simple to use, yet powerful enough to keep users' creativity unleashed. Please see the quick start at the bottom of this page.
About the algorithms
Copy ( DENSITY_COMPRESSION_MODE_COPY )
This is not a so-to-speak algorithm as the name implies. It embeds data inside the density block structures. It can be used to quickly add integrity checks to input data, but it has another important purpose inside each block : if data is marked as incompressible using the target algorithm, a mode reversion occurs and copy mode is instead used for the remainder of the block. On the next block the target algorithm is tried again.
Chameleon ( DENSITY_COMPRESSION_MODE_CHAMELEON_ALGORITHM )
Chameleon is a dictionary lookup based compression algorithm. It is designed for absolute speed and usually reaches a 60% compression ratio on compressible data. Decompression is just as fast. This algorithm is a great choice when main concern is speed.
Cheetah ( DENSITY_COMPRESSION_MODE_CHEETAH_ALGORITHM )
Cheetah was developed in conjunction with Piotr Tarsa (https://github.com/tarsa). It is derived from chameleon and uses swapped double dictionary lookups and predictions. It can be extremely good with highly compressible data (ratio reaching 10% or less). On typical compressible data compression ratio is about 50% or less. It is still extremely fast for both compression and decompression and is a great, efficient all-rounder algorithm.
Lion ( DENSITY_COMPRESSION_MODE_LION_ALGORITHM )
Lion is a multiform compression algorithm derived from cheetah. It goes further in the areas of dynamic adaptation and fine-grained analysis. It uses swapped double dictionary lookups, multiple predictions, shifting sub-word dictionary lookups and forms rank entropy coding. Lion provides the best compression ratio of all three algorithms under any circumstance, and is still very fast.
Quick start (a simple example using the APIs)
Using DENSITY in your application couldn't be any simpler.
First you need to include the 2 following files in your project :
- density_api.h
- density_api_data_structures.h
When this is done you can start using the DENSITY APIs :
#include "density_api.h" #define MY_TEXT "This is a simple example on how to use Density API bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla" #define BUFFER_SIZE DENSITY_MINIMUM_OUT_BUFFER_SIZE ... uint8_t *outCompressed = (uint8_t *) malloc(BUFFER_SIZE * sizeof(uint8_t)); uint8_t *outDecompressed = (uint8_t *) malloc(BUFFER_SIZE * sizeof(uint8_t)); /************** * Buffer API * **************/ density_buffer_processing_result result; result = density_buffer_compress((uint8_t *) TEXT, strlen(TEXT), outCompressed, BUFFER_SIZE, DENSITY_COMPRESSION_MODE_CHAMELEON_ALGORITHM, DENSITY_BLOCK_TYPE_DEFAULT, NULL, NULL); if(!result.state) printf("%llu bytes >> %llu bytes\n", result.bytesRead, result.bytesWritten); result = density_buffer_decompress(outCompressed, result.bytesWritten, outDecompressed, BUFFER_SIZE, NULL, NULL); if(!result.state) printf("%llu bytes >> %llu bytes\n", result.bytesRead, result.bytesWritten); /************** * Stream API * **************/ // We create the stream using the standard malloc and free functions density_stream* stream = density_stream_create(NULL, NULL); DENSITY_STREAM_STATE streamState; // Let's compress our text, using the Chameleon algorithm (extremely fast compression and decompression) if ((streamState = density_stream_prepare(stream, (uint8_t *) TEXT, strlen(TEXT), outCompressed, BUFFER_SIZE))) fprintf(stderr, "Error %i when preparing compression\n", streamState); if ((streamState = density_stream_compress_init(stream, DENSITY_COMPRESSION_MODE_CHAMELEON_ALGORITHM, DENSITY_BLOCK_TYPE_DEFAULT))) fprintf(stderr, "Error %i when initializing compression\n", streamState); if ((streamState = density_stream_compress_continue(stream))) if (streamState != DENSITY_STREAM_STATE_STALL_ON_INPUT) fprintf(stderr, "Error %i occured during compression\n", streamState); if ((streamState = density_stream_compress_finish(stream))) fprintf(stderr, "Error %i occured while finishing compression\n", streamState); printf("%llu bytes >> %llu bytes\n", *stream->totalBytesRead, *stream->totalBytesWritten); // Now let's decompress it, using the density_stream_output_available_for_use() method to know how many bytes were made available if ((streamState = density_stream_prepare(stream, outCompressed, density_stream_output_available_for_use(stream), outDecompressed, BUFFER_SIZE))) fprintf(stderr, "Error %i when preparing decompression\n", streamState); if ((streamState = density_stream_decompress_init(stream, NULL))) fprintf(stderr, "Error %i when initializing decompression\n", streamState); if ((streamState = density_stream_decompress_continue(stream))) if (streamState != DENSITY_STREAM_STATE_STALL_ON_INPUT) fprintf(stderr, "Error %i occured during decompression\n", streamState); if ((streamState = density_stream_decompress_finish(stream))) fprintf(stderr, "Error %i occured while finishing compression\n", streamState); printf("%llu bytes >> %llu bytes\n", *stream->totalBytesRead, *stream->totalBytesWritten); // Free memory density_stream_destroy(stream); free(outCompressed); free(outDecompressed);