- Using Python for hex, strings, bytes and integers
- Conversion between hex and integers for calculation of long numbers.
- Number base conversions
- Hex characters in a string, in a byte array, etc.
- Decoding:
- Encoding (the reverse process):
- Refs:
- Convert Hex to ASCII in Python
- Convert Hex to ASCII in Python Using the decode() Method
- Convert Hex to ASCII in Python Using the codecs.decode() Method
- Related Article — Python ASCII
Using Python for hex, strings, bytes and integers
I’ve taken an interest recently in cryptography, strings, signing strings and using prime numbers for cryptography. I had to keep searching online for the best way to do a transform of my data for:
to my results. This was frustrating.
So, I’ll start at the start an end at the end. It all starts and ends with strings; usually via arrays of bytes and strings of hex.. but how can we operate with and easily transform these in a sensible way?
In the case I was looking at, I would have a signature (say from a cookie) that was both base64 encoded and url quoted. This is done so that the binary representation can be sent across the wire (internet) without too much fuss.
I’d start by unquoting the url encoding on the string (to remove, for example the %2D type characters).
Going into the Python REPL and doing something like the following:
>>> parse.unquote("AAA%2DBBB") 'AAA-BBB'
… returns a string. We can tell as there are single quotes around the result — with nothing else.
What if we wanted to base64 encode this string (‘AAA-BBB’)?
Trying the following fails:
>>> import base64 >>> base64.b64encode('AAA-BBB') Traceback (most recent call last): File "", line 1, in module> File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode encoded = binascii.b2a_base64(s, newline=False) TypeError: a bytes-like object is required, not 'str' >>>
The reason being that the b64encode method expects a byte array to be present.
There are several ways to achieve this.
The easiest is to put a b out the front of our string to tell the interpreter that this string should be used as an array of bytes, like so:
>>> base64.b64encode(b'AAA-BBB') b'QUFBLUJCQg=='
Notice that the result is also an array of bytes, with the b at the front of the string.
To confirm this, we can get the type of the result and display it’s name:
>>> test = base64.b64encode(b'AAA-BBB') >>> type(test).__name__ 'bytes'
Using the b at the start is ok if we are using a string that has been hard-coded; but in most instances we would need to convert a string to a byte array programatically.
For this, there is the .encode() string method. As an example, consider the following:
>>> test_string = 'AAA-BBB' >>> test_bytes = test_string.encode() >>> base64.b64encode(test_bytes) b'QUFBLUJCQg=='
That works as expected. Great. But… what if we need to convert back from a byte array to a string? …I hear you ask. Well, it turns out there’s an accompanying .decode() string method just for that case.
>>> test_bytes = b'QUFBLUJCQg==' >>> test_string = test_bytes.decode() >>> type(test_string).__name__ 'str'
So now we’re brushed-up on strings and bytes, let’s delve into strings and byte arrays of hex values. This is where it gets a bit trickier.
If we start again with our original string AAA-BBB and we want to convert that to it’s hexadecimal representation, it turns out there are a few ways to do this.
The simplest way is to call .encode() on the string, and then .hex() on the byte array. The result is a string of hexadecimal values.
>>> test_string = 'AAA-BBB' >>> hex_string = test_string.encode().hex() >>> type(hex_string).__name__ 'str' >>> hex_string '4141412d424242'
We can see here 41 repeated and 42 repeated as the ASCII values A and B respectively, with a 2d in the middle, the hyphen (-). I find the hex-string representation of a string of bytes very useful for working with. Given the string is not too long, it is useful for inspecting and understanding any patterns in the bytes you are working with. When working with forms of cryptography that depend on numbers (and most does to some extent), this is especially useful.
Another way to get a hex-string is to iterate all the characters of the string, convert them to ordinals then convert them to hex strings then remove all the 0x ‘s at the start and join them back together:
>>> ''.join(hex(ord(c))[2:] for c in 'AAA-BBB') '4141412d424242'
>>> ''.join("".format(ord(c)) for c in 'AAA-BBB') '4141412d424242'
This approach is less neat but more explicit of what is happening.
There is also a helper module in Python 3 called hexlify that takes a byte array and converts this to a string array:
>>> import binascii >>> binascii.hexlify(b'AAA-BBB') b'4141412d424242' >>> binascii.hexlify(b'AAA-BBB').decode() '4141412d424242'
and it’s compliment unhexlify , which takes a string, not a byte array, and outputs a byte array.
>>> binascii.unhexlify('4141412d424242') b'AAA-BBB' >>> binascii.unhexlify('4141412d424242').decode() 'AAA-BBB'
If we want to go the other way, and convert a hex-string or byte array to its string representation there are also many ways to do this. I’ll quickly go through these ways:
hex_string = '1234ABCD' byte_array = b'1234ABCD'
From a byte array to string, we can use byte_array.decode()
From a string of hex to a byte array of characters, we can use bytes.fromhex(hex_string)
From a string of hex to a string of characters, we can use bytes.fromhex(hex_string).decode()
Note that the last .decode() will fail as there are bytes that cannot be converted back to a character representation.
A string representation of a series of hex values may not be printable, and may not even convert neatly to an encoding such a utf-8 . In many cases, this is not an issue as the string is a binary representation of something else anyway and is not dissimilar to a fixed-length array of characters without a null-terminator — but I digress. This is partly why base64 is commonly used across the wire.
Conversion between hex and integers for calculation of long numbers.
As a final step, we may want to use this hex string representation as an integer value. This is probably the most straightforwad step, where we just cast our string to an int() and define that it is of base 16 , for example:
>>> int('4141412d424242', 16) 18367621674189378
Number base conversions
Numbers in Python can be represented many ways. A standard number in Python uses a base of 10 , such that 10 is the number ten. Putting a 0x out the front makes that number now a hex number, like shown below. If we have a hexadecimal value as a string that we want to convert to a decimal, we can use the int(‘hex’, 16) approach below.
Hex characters in a string, in a byte array, etc.
Values in a string can also be represented by their hexadecimal equivalent by escaping the value with \x , as shown.
string = '\x4a\x82\xfd\xfe\xff\x00'
There is a continuum of data types when handling data between a website and it’s use and analysis:
Decoding:
- url-encoded, base 64 encoded string
- base 64 encoded string
- non-localization-encoded binary string
- byte array of characters
- string of hexadecimal values (0-9, a-f)
- hexadecimal representation
- integer representation
Encoding (the reverse process):
- integer representation
- hexadecimal representation
- string of hexadecimal values (0-9, a-f)
- byte array of characters
- non-localization-encoded binary string
- base 64 encoded string
- url-encoded, base 64 encoded string
Refs:
Convert Hex to ASCII in Python
- Convert Hex to ASCII in Python Using the decode() Method
- Convert Hex to ASCII in Python Using the codecs.decode() Method
This tutorial will look into various methods to convert a hexadecimal string to an ASCII string in Python. Suppose we have a string written in hexadecimal form 68656c6c6f and we want to convert it into an ASCII character string which will be hello as h is equal to 68 in ASCII code, e is 64 , l is 6c and o is 6f .
We can convert a hexadecimal string to an ASCII string in Python using the following methods:
Convert Hex to ASCII in Python Using the decode() Method
The string.decode(encoding, error) method in Python 2 takes an encoded string as input and decodes it using the encoding scheme specified in the encoding argument. The error parameter specifies the error handling schemes to use in case of an error that can be strict , ignore , and replace .
Therefore, to convert a hex string to an ASCII string, we need to set the encoding parameter of the string.decode() method as hex . The below example code demonstrates how to use the string.decode() method to convert a hex to ASCII in Python 2.
string = "68656c6c6f" string.decode("hex")
In Python 3, the bytearray.decode(encoding, error) method takes a byte array as input and decodes it using the encoding scheme specified in the encoding argument.
To decode a string in Python 3, we first need to convert the string to a byte array and then use the bytearray.decode() method to decode it. The bytearray.fromhex(string) method can be used to convert the string into a byte array first.
The below example code demonstrates how to use the bytearray.decode() and bytearray.fromhex(string) method to convert a hex string to ASCII string in Python 3:
string = "68656c6c6f" byte_array = bytearray.fromhex(string) byte_array.decode()
Convert Hex to ASCII in Python Using the codecs.decode() Method
The codecs.decode(obj, encoding, error) method is similar to decode() method. It takes an object as input and decodes it using the encoding scheme specified in the encoding argument. The error argument specifies the error handling scheme to be used in case of an error.
In Python 2, the codecs.decode() returns a string as output, and in Python 3, it returns a byte array. The below example code demonstrates how to convert a hex string to ASCII using the codecs.decode() method and convert the returned byte array to string using the str() method.
import codecs string = "68656c6c6f" binary_str = codecs.decode(string, "hex") print(str(binary_str,'utf-8'))