Compression Examples

Programmer Reference : Data Compression and Decompression : Compression Examples

Compression is done using various subclasses of EsCompressionStream.

Deflate Streams

DefaultWriteStream will compress bytes that flow through it using the DEFLATE algorithm. These compressed bytes are then passed on to the stream being wrapped which can be a memory, file or socket stream. Once the stream is closed, the compressed bytes can be accessed using the #contents API.

| writeStream compressedBytes |

writeStream := DeflateWriteStream on: String new writeStream.

[writeStream nextPutAll: 'Hello!'] ensure: [writeStream close].

compressedBytes := writeStream contents.

The following example will decompress the bytes back into the original String using an InflateReadStream.

| readStream originalString |

readStream:= InflateReadStream on: compressedBytes.

[originalString := readStream contents] ensure: [readStream close].

All Strings and ByteArrays understand how to (de)compress themselves using the DEFLATE algorithm. This is useful if you already have all the data in-memory. The following example is the same as above but using these Byte Collection APIs.

| compressedBytes originalString |

compressedBytes := 'Hello!' deflate.

originalString := compressedBytes inflate.

Gzip Streams

Compression using the Gzip format is done using the Gzip(Read|Write)Stream classes. The difference between GzipWriteStream and DeflateWriteStream is that GzipWriteStream writes a different header and then appends a computed cyclic redundancy check (CRC) after compressing the input data.

| writeStream compressedBytes |

writeStream := GzipWriteStream on: String new writeStream.

[writeStream nextPutAll: 'Hello!'] ensure: [writeStream close].

compressedBytes := writeStream contents.

The following example will decompress the bytes back into the original String.

| readStream originalString |

readStream:= GzipReadStream on: compressedBytes.

[originalString := readStream contents] ensure: [readStream close].

All Strings and ByteArrays understand how to (de)compress themselves using the Gzip algorithm. This is useful if you already have all the data in-memory. The following example is the same as above but using these Byte Collection APIs.

| compressedBytes originalString |

compressedBytes := 'Hello!' gzipCompress.

originalString := compressedBytes gzipDecompress.

Brotli Streams

Compression using the Brotli format is done using the Brotli(Read|Write)Stream classes. Brotli is a lossless data compression algorithm that is similar in speed with deflate but offers more dense compression

| writeStream compressedBytes |

writeStream := BrotliWriteStream on: String new writeStream.

[writeStream nextPutAll: 'Hello!'] ensure: [writeStream close].

compressedBytes := writeStream contents.

The following example will decompress the bytes back into the original String.

| readStream originalString |

readStream := BrotliReadStream on: compressedBytes.

[originalString := readStream contents] ensure: [readStream close].

All Strings and ByteArrays understand how to (de)compress themselves using the Brotli algorithm. This is useful if you already have all the data in-memory. The following example is the same as above but using these Byte Collection APIs.

| compressedBytes originalString |

compressedBytes := 'Hello!' brotliCompress.

originalString := compressedBytes brotliDecompress.

LZ4 Streams

Compression using the LZ4 format is done using the LZ4(Read|Write)Stream classes. LZ4 is a lossless data compression algorithm focused on (de)compression speed.

| writeStream compressedBytes |

writeStream := LZ4WriteStream on: String new writeStream.

[writeStream nextPutAll: 'Hello!'] ensure: [writeStream close].

compressedBytes := writeStream contents.

The following example will decompress the bytes back into the original String.

| readStream originalString |

readStream := LZ4ReadStream on: compressedBytes.

[originalString := readStream contents] ensure: [readStream close].

All Strings and ByteArrays understand how to (de)compress themselves using the LZ4 algorithm. This is useful if you already have all the data in-memory. The following example is the same as above but using these Byte Collection APIs.

| compressedBytes originalString |

compressedBytes := 'Hello!' lz4Compress.

originalString := compressedBytes lz4Decompress.

ZStandard Streams

Compression using the ZStandard format is done using the Zstd(Read|Write)Stream classes. ZStandard is a fast lossless data compression algorithm targeting real-time compression scenarios at zlib-level or better compression ratios.

| writeStream compressedBytes |

writeStream := ZstdWriteStream on: String new writeStream.

[writeStream nextPutAll: 'Hello!'] ensure: [writeStream close].

compressedBytes := writeStream contents.

The following example will decompress the bytes back into the original String.

| readStream originalString |

readStream := ZstdReadStream on: compressedBytes.

[originalString := readStream contents] ensure: [readStream close].

All Strings and ByteArrays understand how to (de)compress themselves using the ZStandard algorithm. This is useful if you already have all the data in-memory. The following example is the same as above but using these Byte Collection APIs.

| compressedBytes originalString |

compressedBytes := 'Hello!' zstdCompress.

originalString := compressedBytes zstdDecompress.

Zip Archive Streams

Zip(Read|Write)Streams offer a cursor based approach to streaming over a zip archive. A ZipWriteStream will write out a valid zip archive to the stream that it wraps. This will include all the local file entries, as well as a central directory that is at the end of all zip archives. This can be built directly in memory and streamed over a socket. A ZipReadStream will read local file entries and data from a zip archive that might be coming from memory, file or socket stream sources. Any zip utility will be able to read the zip archives produced by ZipWriteStream.

The following example shows the creation of a valid Zip archive in memory. ZipStreams are special because they cursor over zip entries but stream over zip entry data. What this means is that before you use the normal Stream API (i.e. next, upToEnd), the zip stream needs to be positioned on a particular zip entry. The following example below demonstrates the cursor/stream approach

This example shows how to create a Zip Archive with two files (‘hello1.txt’ and ‘hello2.txt’) in a common subdirectory called ‘dir’. This Zip Archive will be written out to a byte array in memory, but just as easily could have been written to a file or streamed across a socket. Notice that we first add an entry to the stream. Now the stream is ready to accept the entry (file) content. When you are done with the content, simply place the next entry or close the stream.

| zip |

zip := ZipWriteStream on: ByteArray new.

zip

comment: 'My zip archive comment';

nextPutEntry: (ZipEntry named: 'dir/hello1.txt');

nextPutAll: 'This is the content of hello1.txt';

nextPutEntry: (ZipEntry named: 'dir/hello2.txt');

nextPutAll: 'This is the content of hello2.txt';

close.

The next example shows reading the entries and content of this particular Zip Archive. Normally, one would go into a ‘nextEntry’ loop and process entries until there were no more left. However, here we know what the entries are and have added some extra assert-like statements to make it clear how this works.
First, we grab the next entry. This will cursor the stream to the appropriate place to start reading entry content. To read content, the normal Stream API is used. This means atEnd refers to the entry content, not the whole Zip Archive. If nextEntry is nil, this means that you have streamed over the complete Zip Archive.

| zip entry |

zip := ZipReadStream on: zip contents asString.

(entry := zip nextEntry) ifNil: [self halt].

entry name = 'dir/hello1.txt' ifFalse: [self halt].

zip contents = 'This is the content of hello1.txt' ifFalse: [self halt].

(entry := zip nextEntry) ifNil: [self halt].

entry name = 'dir/hello2.txt' ifFalse: [self halt].

zip contents = 'This is the content of hello2.txt' ifFalse: [self halt].

zip nextEntry ifNotNil: [self halt].

This final example demonstrates how to make use of the language encoding flag so that UTF-8 can be used for Zip comments and filenames. This can easily be done by creating a new zip file and requesting use of UTF-8 codepage as seen below.

(ZipWriteStream on: (CfsWriteFileStream openEmpty: 'ZipFile.zip'))

useUtf8CodePage;

nextPutEntry: (ZipEntry named: 'München.txt');

nextPutAll: 'Hallo';

Last modified date: 03/03/2020