Patnaik's Lab

Kotlin Multiplatform — GZip Compression

This is a part of a series of articles exploring Kotlin Multiplatform, React Native and C++. Here we will learn about one of the common utilities needed in an app. Compression, specifically gzip compression.

We are implementing this for android and iOS, and if time allows I will edit this in the future to include an implementation for the browser.

The completed source code is present in this repository with tests.

https://github.com/shibasis0801/kmm_zlib_compression

Target

We will go through how we can use Compression from the common source set while having native implementations in android and iOS, without any external libraries.

I believe that while designing we should first write down how the final product should look like. This will help guide our creativity towards the end result.

The final result will be this

data class CompressionResponse(
val base64EncodedString: String,
)

data class CompressionRequest(
val data: String
)

data class DecompressionRequest(
val base64EncodedString: String
)

data class DecompressionResponse(
val data: String
)

interface Compressor: Component {
fun compress(request: CompressionRequest): CompressionResponse?
fun decompress(request: DecompressionRequest): DecompressionResponse?
}

Interfaces are more general than the expect/actual mechanism in Kotlin multiplatform. Expect/Actual is useful if you want to specify some functionality to be present statically, interfaces are more useful for dynamic functionality that you can inject into your applications.

We will be implementing this interface for both Android and iOS in this article.

Why compress ?

gzip compression is really fast at < 10ms for most cases, but it depends on which algorithm you choose, which compression level you target, size of your data, uniqueness of your data, etc. (You need to benchmark this to understand if there would be benefits for your use case).

For our use case, we needed to store large JSON payloads (2mb) with recurring structure that will be written rarely but read frequently. We could store the JSON payload directly in SQLite but due to the size of the payload, it takes a lot of time to query. File I/O could also be used but we needed some more columns to be able to query which payload(s) to select.

Compressed payloads have great local and network I/O performance, and helped us bring down SQL read times from 10ms to 1ms.
(Repo with benchmarks for public access is work in progress)

Platform Support

Both android and iOS support gzip natively and we don’t need to include external libraries unless we want some other compression, or a better implementation.

Android

Android provides gzip support through Java. You have access to the classes GZIPOutputStream and GZIPInputStream (https://developer.android.com/reference/java/util/zip/GZIPOutputStream).

As the name suggests you supply an InputStream or OutputStream to the classes and it helps you compress and decompress.

Why Streams ?

As compression is used generally for large amounts of data, there is always a possibility for running out of memory. To avoid this, instead of directly reading the data into memory we read it in chunks.
This article does not go into this aspect of compression and assumes that you can fit the object directly in memory. (Let me know if I should extend the article to add this).

Compression

// Compression
val byteStream = ByteArrayOutputStream()
val content = "Some string"
GZIPOutputStream(byteStream)
.bufferedWriter(StandardCharsets.UTF_8)
.use { it.write(content) }
val compressedBytes = byteStream.toByteArray()

GZIPOutputStream takes in a ByteArrayOutputStream where it will write the compression result.
You supply the string input using the write method available.
You can convert the result output into a ByteArray for further use / storage.
As these are strings, we also have to supply the source character encoding as Compression is a numeric algorithm and works on the actual byte values of the characters.

Decompression

val byteStream = ByteArrayInputStream(compressedBytes)
val string = GZIPInputStream(byteStream)
.bufferedReader(StandardCharsets.UTF_8)
.use { it.readText() }

To decompress, we do something very similar.
Assuming that we have got the compressed bytes from the previous step, we supply them to a ByteArrayInputStream.
Then GZIP operates on this InputStream to get the decompressed text.

Caveats

Compression / Decompression can fail and it can throw an exception.
We need to handle these exceptions and also have something like a boolean check to know if the data is compressed or not.

iOS

iOS provides support for compression and decompression through the Compression Framework (https://developer.apple.com/documentation/compression). It supports a lot of algorithms but we are interested in ZLib to have the same algorithm on both platforms. It’s not necessary to have the same, but for simplicity we do gzip on both.

The compression and decompression functions are lower level than the Android counterparts, and combined with Kotlin/Native it results in a little more complex syntax for the equivalent functionality.

But no worries, it will help you understand Kotlin/Native and Native(C, C++, ObjectiveC) interop and you will be able to interface with other native libraries after understanding this.

For this reason I will write both Swift and Kotlin versions, because the Swift version is very easy to search for and will help you understand how things map from Swift / Objective C to Kotlin/Native.

To better understand the steps and how generally compression works in iOS, please go through https://developer.apple.com/documentation/accelerate/compressing_and_decompressing_data_with_buffer_compression.

Compression (Swift)

let input = "some string"
guard let inputData = input.data(using: .utf8) else { return nil }

let destinationBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: inputData.count * 2)
let compressedSize = compression_encode_buffer(destinationBuffer, inputData.count * 2, [UInt8](inputData), inputData.count, nil, COMPRESSION_ZLIB)

guard compressedSize != 0 else {
destinationBuffer.deallocate()
return nil
}

let compressedData = Data(bytes: destinationBuffer, count: compressedSize)
destinationBuffer.deallocate()
return compressedData

To break down the code line by line.
First, we convert the input from utf string to an NSData object

guard let inputData = input.data(using: .utf8) else { return nil }

Second, we allocate an array to store the compressed result. Now we have size * 2 as the capacity because in the worst case we could have a larger compressed result for a given input string (in that case, we should not compress as it does not make sense).
This factor and the handling of abnormal cases is to be tuned by the developer.

The reason it is prefixed Unsafe is that it is a raw array similar to what we have in C. It needs manual memory management, can be interpreted into a different type just by casting and has no bounds checks.

This is perfectly fine for our usecase and is also required for algorithms written in a lower level language.

let destinationBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: inputData.count * 2)

Next, we actually perform the compression. compression_encode_buffer takes in a destination buffer, destination size, input buffer, input size, an optional scratch buffer for performance advantages and finally the compression algorithm to be used.

let compressedSize = compression_encode_buffer(
destinationBuffer,
inputData.count * 2,
[UInt8](inputData),
inputData.count,
nil,
COMPRESSION_ZLIB
)

Next, we check whether compression was successful by validating the length of the destination buffer. If compression failed, the size will be zero.
In either case we should deallocate the Unsafe buffer.

guard compressedSize != 0 else {
destinationBuffer.deallocate()
return nil
}

Finally, we convert the buffer back to NSData and return it to the user.

let compressedData = Data(bytes: destinationBuffer, count: compressedSize)
destinationBuffer.deallocate()
return compressedData

Compression (Kotlin/Native)

Now we will translate the function from Swift to Kotlin. I will explain what each step does, but before that let me show both snippets in their entirety.
Don’t be overwhelmed, though it looks complex when we go step by step it will be easy to understand.

let input = "some string"
guard let inputData = input.data(using: .utf8) else { return nil }

let destinationBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: inputData.count * 2)
let compressedSize = compression_encode_buffer(destinationBuffer, inputData.count * 2, [UInt8](inputData), inputData.count, nil, COMPRESSION_ZLIB)

guard compressedSize != 0 else {
destinationBuffer.deallocate()
return nil
}

let compressedData = Data(bytes: destinationBuffer, count: compressedSize)
destinationBuffer.deallocate()
return compressedData
memScoped {
val content = "some string"
val inputData = content.encodeToByteArray().toUByteArray()

val destinationBuffer = allocArray<UByteVar>(capacity)
val compressedSize = compression_encode_buffer(
destinationBuffer,
capacity.convert(),
inputData.toUByteArray().toCValues(),
inputData.size.convert(),
null,
COMPRESSION_ZLIB
)

val bytes = destinationBuffer.readBytes(compressedSize.convert())
}

We can see that it is pretty similar to the Swift version but there are few new guests, memScoped, allocArray, convert, etc. Lets meet them.

memScoped in Kotlin/Native creates a memory block inside which you can allocate dynamic memory and all of it will be de-allocated once the scope finishes its work.
Here, this will allow us to avoid calling deallocate manually. It isn’t different logically but one less thing to worry about.

allocArray creates an UnsafeMutablePointer.

convert helps you convert integers from their signed / unsigned variants. This can also be done manually but for now I have used the extension

Now, lets start.

Given a string input, we convert it into an ByteArray. Here UTF-8 is default so we have not specified. If your string is in some other encoding, we will need to mention it at this step.

We don’t need a normal ByteArray but an UnsignedByteArray. Compression algorithms work on byte values and generally those bytes are unsigned. For zlib, they are unsigned, hence the need to convert.

val content = "some string"
val inputData = content.encodeToByteArray().toUByteArray()

Within the memscoped block, we create a new array to hold the compression result.

val destinationBuffer = allocArray<UByteVar>(capacity)

We call the algorithm. Its very similar to the swift version, except for toCValues() function. (and convert which is a helper for signed/unsigned integer conversions)

toCValues() is used to convert a kotlin ByteArray into an array C can understand. It copies a kotlin array into a C array. (this could be optimised further)

val compressedSize = compression_encode_buffer(
destinationBuffer,
capacity.convert(),
inputData.toCValues(),
inputData.size.convert(),
null,
COMPRESSION_ZLIB
)

We don’t need explicit deallocation steps due to the memscoped blocks.
Finally we read the bytes from the destinationBuffer into a ByteArray

val bytes = destinationBuffer.readBytes(compressedSize.convert())

Decompression

Very similar to how compression works, decompression differs in the destination buffer size. While compression in the worst case can be slightly larger than the original content. Decompression will almost always be much larger. We need to tune the buffer size according to our usecase. Too big and we waste memory, too small and we crash.

Here I am using a 10MB buffer which should be adequate for large json payloads. Again, this must be tuned for your usecase.

let capacity = 10_000_000 // 10 MB
let destinationBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: capacity)
let decompressedSize = compression_decode_buffer(
destinationBuffer,
capacity,
[UInt8](compressedData),
compressedData.count,
nil,
COMPRESSION_ZLIB
)

guard decompressedSize != 0 else {
destinationBuffer.deallocate()
return nil
}

let decompressedData = Data(bytes: destinationBuffer, count: decompressedSize)
destinationBuffer.deallocate()
val result = String(data: decompressedData, encoding: .utf8)

The code should be very familiar to what we went in depth earlier.

Let us now see the Kotlin/Native version.

val capacity = 10_000_000
val destinationBuffer = allocArray<UByteVar>(capacity)

val input = "some string".encodeToByteArray().toUByteArray()

val oldSize = compression_decode_buffer(
destinationBuffer,
capacity.convert(),
input.toCValues(),
input.size.convert(),
null,
COMPRESSION_ZLIB
)

val normalString = destinationBuffer.readBytes(oldSize.convert()).decodeToString()

We must remember that both algorithms only work with unsigned byte arrays (c compatible versions that is).

Let me know in the comments if this part is unclear, I will edit and extend the article.

Kotlin Multiplatform

Back to the main story, we want to implement the following interface.

data class CompressionResponse(
val base64EncodedString: String,
)

data class CompressionRequest(
val data: String
)

data class DecompressionRequest(
val base64EncodedString: String
)

data class DecompressionResponse(
val data: String
)

interface Compressor: Component {
fun compress(request: CompressionRequest): CompressionResponse?
fun decompress(request: DecompressionRequest): DecompressionResponse?
}

Android Implementation

@OptIn(ExperimentalEncodingApi::class)
class AndroidCompressor: Compressor {
override fun compress(request: CompressionRequest): CompressionResponse? {
return try {
val byteStream = ByteArrayOutputStream()
GZIPOutputStream(byteStream)
.bufferedWriter(StandardCharsets.UTF_8)
.use { it.write(request.data) }

val compressedBytes = byteStream.toByteArray()
CompressionResponse(Base64.Default.encode(compressedBytes))
} catch (e: Exception) { null }
}

override fun decompress(request: DecompressionRequest): DecompressionResponse? {
return try {
val compressedBytes = Base64.Default.decode(request.base64EncodedString)

val byteStream = ByteArrayInputStream(compressedBytes)
val string = GZIPInputStream(byteStream)
.bufferedReader(StandardCharsets.UTF_8)
.use { it.readText() }

DecompressionResponse(string)
} catch (e: Exception) { null }
}
}

iOS Implementation

@OptIn(ExperimentalUnsignedTypes::class, ExperimentalEncodingApi::class)
class DarwinCompressor(private val capacity: Long = 10_000_000 /* 10 MB, to be tuned */): Compressor {
override fun compress(request: CompressionRequest): CompressionResponse? {
return try {
memScoped {
val inputData = request.data.encodeToByteArray()
val destinationBuffer = allocArray<UByteVar>(capacity)

val newSize = compression_encode_buffer(
destinationBuffer, capacity.convert(),
inputData.toUByteArray().toCValues(), inputData.size.convert(),
null,
COMPRESSION_ZLIB
)

val bytes = destinationBuffer.readBytes(newSize.convert())
val base64EncodedString = Base64.Default.encode(bytes)
CompressionResponse(base64EncodedString)
}
} catch (e: Exception) { null }
}

override fun decompress(request: DecompressionRequest): DecompressionResponse? {
return try {
memScoped {
val input = Base64.Default.decode(request.base64EncodedString).also { println(it.size) }
val destinationBuffer = allocArray<UByteVar>(capacity)

val oldSize = compression_decode_buffer(
destinationBuffer, capacity.convert(),
input.toUByteArray().toCValues(), input.size.convert(),
null,
COMPRESSION_ZLIB
)

val normalString = destinationBuffer.readBytes(oldSize.convert()).decodeToString()
DecompressionResponse(normalString)
}
} catch (e: Exception) { null }
}
}

Apart from what we have already discussed, this also has Base64 encoding.
Compression works with raw bytes, but raw bytes are not easy to store or transfer across the network due to differing encoding schemes.
There are little-endian and big-endian systems and if we store raw bytes we need to be aware about each and convert as needed.

Instead of worrying about encoding schemes, we can use Base64 encoding to get a platform independent format for storage / transmission.