.. _bytes: Working with binary data ######################## Crandas offers the ability to store and manipulate binary data as ``bytes``. In general, strings are more flexible but bytes are more efficient and some operations are only possible on bytes. For example, bytes can be used to perform bitwise operators or cryptographic functions like encryption and hashing. There are also some helper functions to help with conversions between bytes and strings. A bytes column can be uploaded using the bytes datatype in Python. For example, all the following columns contain the same data: .. code:: python byte_data = [b"AAP", b"Noot", b"Mies"] encoded_data = ["AAP".encode(), "Noot".encode(), "Mies".encode()] hex_data = [bytes.fromhex("414150"), bytes.fromhex("4e6f6f74"), bytes.fromhex("4d696573")] df = cd.DataFrame({"bytes": byte_data, "encoded": encoded_data, "hex": hex_data}) Bitwise operators ================= The bitwise operators ``&``, ``|``, ``^`` and ``~`` are supported on bytes columns. They perform the specified operator on all underlying bits individually. For the binary operators, both operands need to have the same length. Cryptographic functions ======================= Beyond the basic operations, bytes columns can be used to perform encryption and hashing. Encryption ---------- To encrypt data directly using a block cipher, the :mod:`crandas.crypto.cipher` module can be used as follows: .. code:: python import crandas.crypto.cipher as cipher tab_key = cd.DataFrame({"key": [bytes.fromhex("000102030405060708090a0b0c0d0e0f")]}) tab_data = cd.DataFrame({"data": [bytes.fromhex("00112233445566778899aabbccddeeff"), bytes.fromhex("ffeeddccbbaa99887766554433221100")]}) # Encryption aes_128 = cipher.AES_128(tab_key["key"].as_value()) # Currently AES-128, AES-192 and AES-256 are supported aes_128.encrypt(tab_data["data"]) # Decryption is currently not supported Hashing ------- To hash data using either a hash function directly or by using the HMAC mode, the :mod:`crandas.crypto.hash` module can be used: .. code:: python import crandas.crypto.hash as hash tab_key = cd.DataFrame({"key": [bytes.fromhex("bcb25f81807bcb5995c2f663eaeb02f1248de8f3")]}) tab_data = cd.DataFrame({"data": [b"John Doe", b"Foo Bar"]}) # Regular hash h = hash.RIPEMD_160 # Currently only RIPEMD-160 is supported h.digest(tab_data["data"]) # HMAC h = hash.HMAC_RIPEMD_160(tab_key["key"].as_value()) res = h.digest(tab_data["data"]) Conversion from and to strings ============================== Various conversions from and to strings are possible. Encoding ASCII strings as bytes is possible using the ``string_col.encode()``. Conversion from and to lowercase hexadecimal strings can be done using ``string_col.from_hex()`` and ``bytes_col.to_hex()``. Finally, base64 encoding and decoding is supported using ``string_col.b64decode()`` and ``bytes_col.b64encode()``. While strings are generally more powerful than bytes, using bytes is consederably more efficient in both memory and computation. Whenever there is string data in a table that does not require string specific functionality, it might make sense to convert it to bytes before uploading it to the engine. After learning the things we can do with bytes, we will learn how to work with dates in crandas in the :ref:`next section`.