Working with binary data¶
Crandas offers the ability to store and manipulate binary data as bytes
.
In general, strings are more flexible but bytes are more efficient and some operations are only possible on bytes.
For example, bytes can be used to perform bitwise operators or cryptographic functions like encryption and hashing.
There are also some helper functions to help with conversions between bytes and strings.
A bytes column can be uploaded using the bytes datatype in Python. For example, all the following columns contain the same data:
byte_data = [b"AAP", b"Noot", b"Mies"]
encoded_data = ["AAP".encode(), "Noot".encode(), "Mies".encode()]
hex_data = [bytes.fromhex("414150"), bytes.fromhex("4e6f6f74"), bytes.fromhex("4d696573")]
df = cd.DataFrame({"bytes": byte_data, "encoded": encoded_data, "hex": hex_data})
Bitwise operators¶
The bitwise operators &
, |
, ^
and ~
are supported on bytes columns.
They perform the specified operator on all underlying bits individually.
For the binary operators, both operands need to have the same length.
Cryptographic functions¶
Beyond the basic operations, bytes columns can be used to perform encryption and hashing.
Encryption¶
To encrypt data directly using a block cipher, the crandas.crypto.cipher
module can be used as follows:
import crandas.crypto.cipher as cipher
tab_key = cd.DataFrame({"key": [bytes.fromhex("000102030405060708090a0b0c0d0e0f")]})
tab_data = cd.DataFrame({"data": [bytes.fromhex("00112233445566778899aabbccddeeff"), bytes.fromhex("ffeeddccbbaa99887766554433221100")]})
# Encryption
aes_128 = cipher.AES_128(tab_key["key"].as_value()) # Currently AES-128, AES-192 and AES-256 are supported
aes_128.encrypt(tab_data["data"])
# Decryption is currently not supported
Hashing¶
To hash data using either a hash function directly or by using the HMAC mode, the crandas.crypto.hash
module can be used:
import crandas.crypto.hash as hash
tab_key = cd.DataFrame({"key": [bytes.fromhex("bcb25f81807bcb5995c2f663eaeb02f1248de8f3")]})
tab_data = cd.DataFrame({"data": [b"John Doe", b"Foo Bar"]})
# Regular hash
h = hash.RIPEMD_160 # Currently only RIPEMD-160 is supported
h.digest(tab_data["data"])
# HMAC
h = hash.HMAC_RIPEMD_160(tab_key["key"].as_value())
res = h.digest(tab_data["data"])
Conversion from and to strings¶
Various conversions from and to strings are possible.
Encoding ASCII strings as bytes is possible using the string_col.encode()
.
Conversion from and to lowercase hexadecimal strings can be done using string_col.from_hex()
and bytes_col.to_hex()
.
Finally, base64 encoding and decoding is supported using string_col.b64decode()
and bytes_col.b64encode()
.
While strings are generally more powerful than bytes, using bytes is consederably more efficient in both memory and computation. Whenever there is string data in a table that does not require string specific functionality, it might make sense to convert it to bytes before uploading it to the engine.
After learning the things we can do with bytes, we will learn how to work with dates in crandas in the next section.