- How to Read Binary File in Python – Detailed Guide
- Read binary file byte by byte
- Python Read Binary File into Byte Array
- Python read binary file into numpy array
- Read binary file Line by Line
- Read Binary File Fully in One Shot
- Python Read Binary File and Convert to Ascii
- Read binary file into dataframe
- Read binary file skip header
- Conclusion
How to Read Binary File in Python – Detailed Guide
Binary files are files that are not normal text files. Example: An Image File. These files are also stored as a sequence of bytes in the computer hard disk. These types of binary files cannot be opened in the normal mode and read as text.
You can read binary file by opening the file in binary mode using the open(‘filename’, ‘rb’) .
When working with the problems like image classification in Machine learning, you may need to open the file in binary mode and read the bytes to create ML models. In this situation, you can open the file in binary mode, and read the file as bytes. In this case, decoding of bytes to the relevant characters will not be attempted. On the other hand, when you open a normal file in the normal read mode, the bytes will be decoded to string or the other relevant characters based on the file encoding.
If you’re in Hurry
You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes.
open(‘filename’, «rb») opens the binary file in read mode.
r – To specify to open the file in reading mode
b – To specify it’s a binary file. No decoding of bytes to string attempt will be made.
The below example reads the file one byte at a time and prints the byte.
try: with open("c:\temp\Binary_File.jpg", "rb") as f: byte = f.read(1) while byte: # Do stuff with byte. byte = f.read(1) print(byte) except IOError: print('Error While Opening the file!')
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn how to read binary files in different ways.
Read binary file byte by byte
In this section, you’ll learn how to read a binary file byte by byte and print it. This is one of the fastest ways to read the binary file.
The file is opened using the open() method and the mode is mentioned as “rb” which means opening the file in reading mode and denoting it’s a binary file. In this case, decoding of the bytes to string will not be made. It’ll just be read as bytes.
The below example shows how the file is read byte by byte using the file.read(1) method.
The parameter value 1 ensures one byte is read during each read() method call.
try: with open("c:\temp\Binary_File.jpg", "rb") as f: byte = f.read(1) while byte: # Do stuff with byte. byte = f.read(1) print(byte) except IOError: print('Error While Opening the file!')
b'\xd8' b'\xff' b'\xe0' b'\x00' b'\x10' b'J' b'F' b'I' b'F' b'\x00' b'\x01' b'\x01' b'\x00' b'\x00' b'\x01' b'\x00' b'\x01' b'\x00' b'\x00' b'\xff' b'\xed' b'\x00' b'|' b'P' b'h' b'o' b't' b'o' b's' b'h' b'o' b'p' b' ' b'3' b'.' b'0' b'\xc6' b'\xb3' b'\xff' b'\xd9' b''
Python Read Binary File into Byte Array
In this section, you’ll learn how to read the binary files into a byte array.
First, the file is opened in the“ rb “ mode.
A byte array called mybytearray is initialized using the bytearray() method.
Then the file is read one byte at a time using f.read(1) and appended to the byte array using += operator. Each byte is appended to the bytearray .
At last, you can print the bytearray to display the bytes that are read.
try: with open("c:\temp\Binary_File.jpg", "rb") as f: mybytearray = bytearray() # Do stuff with byte. mybytearray+=f.read(1) mybytearray+=f.read(1) mybytearray+=f.read(1) mybytearray+=f.read(1) mybytearray+=f.read(1) print(mybytearray) except IOError: print('Error While Opening the file!')
bytearray(b'\xff\xd8\xff\xe0\x00\x10')
Python read binary file into numpy array
In this section, you’ll learn how to read the binary file into a NumPy array.
First, import numpy as np to import the numpy library.
Then specify the datatype as bytes for the np object using np.dtype(‘B’)
Next, open the binary file in reading mode.
Now, create the NumPy array using the fromfile() method using the np object.
Parameters are the file object and the datatype initialized as bytes. This will create a NumPy array of bytes.
import numpy as np dtype = np.dtype('B') try: with open("c:\temp\Binary_File.jpg", "rb") as f: numpy_data = np.fromfile(f,dtype) print(numpy_data) except IOError: print('Error While Opening the file!')
The bytes are read into the numpy array and the bytes are printed.
Read binary file Line by Line
In this section, you’ll learn how to read binary file line by line.
You can read the file line by line using the readlines() method available in the file object.
Each line will be stored as an item in the list. This list can be iterated to access each line of the file.
rstrip() method is used to remove the spaces in the beginning and end of the lines while printing the lines.
f = open("c:\temp\Binary_File.jpg",'rb') lines = f.readlines() for line in lines: print(line.rstrip())
b'\x07\x07\x07\x07' b'' b'' b'' b'' b'' b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06' b'\x06\x06'
Read Binary File Fully in One Shot
In this section, you’ll learn how to read binary file in one shot.
You can do this by passing -1 to the file.read() method. This will read the binary file fully in one shot as shown below.
try: f = open("c:\temp\Binary_File.jpg", 'rb') while True: binarycontent = f.read(-1) if not binarycontent: break print(binarycontent) except IOError: print('Error While Opening the file!')
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xed\x00|Photoshop 3.0\x008BIM\x04\x04\x00\x00\x00\x00\x00\x1c\x02(\x00ZFBMD2300096c010000fe0e000032160000051b00003d2b000055300000d6360000bb3c0000ce4100008b490000\x00\xff\xdb\x00C\x00\x03\x03\x03\x03\x03\x03\x05\x03\x03\x05\x07\x05\x05\x05\x07\n\x07\x07\x07\x07\n\x0c\n\n\n\n\n\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06\n\x06\x06\n\x18\x11\x0e\x11\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18
Python Read Binary File and Convert to Ascii
In this section, you’ll learn how to read a binary file and convert to ASCII using the binascii library. This will convert all the bytes into ASCII characters.
Read the file as binary as explained in the previous section.
Next, use the method binascii.b2a_uu(bytes) . This will convert the bytes into ascii and return an ascii value.
Then you can print this to check the ascii characters.
import binascii try: with open("c:\temp\Binary_File.jpg", "rb") as f: mybytes = f.read(45) data_bytes2ascii = binascii.b2a_uu(mybytes) print("Binary String to Ascii") print(data_bytes2ascii) except IOError: print("Error While opening the file!")
Binary String to Ascii b'M_]C_X 02D9)1@ ! 0 0 ! #_[0!\\4&AO=&]S:&]P(#,N, X0DE-! 0 \n'
Read binary file into dataframe
In this section, you’ll learn how to read the binary file into pandas dataframe.
First, you need to read the binary file into a numpy array . Because there is no method available to read the binary file to dataframe directly.
Once you have the numpy array , then you can create a dataframe with the numpy array .
Pass the NumPy array data into the pd.DataFrame() . Then you’ll have the dataframe with the bytes read from the binary file.
import numpy as np import pandas as pd # Create a dtype with the binary data format and the desired column names try: dt = np.dtype('B') data = np.fromfile("c:\temp\Binary_File.jpg", dtype=dt) df = pd.DataFrame(data) print(df) except IOError: print("Error while opening the file!")
0 0 255 1 216 2 255 3 224 4 0 . . 18822 0 18823 198 18824 179 18825 255 18826 217 [18827 rows x 1 columns]
This is how you can read a binary file using NumPy and use that NumPy array to create the pandas dataframe.
With the NumPy array, you can also read the bytes into the dictionary.
Read binary file skip header
In this section, you’ll learn how to read binary file, skipping the header line in the binary file. Some binary files will be having the ASCII header in them.
This skip header method can be useful when reading the binary files with the ASCII headers.
You can use the readlines() method available in the File object and specify [1:] as an additional parameter. This means the line from index 1 will be read.
The ASCII header line 0 will be ignored.
f = open("c:\temp\Binary_File.jpg",'rb') lines = f.readlines()[1:] for line in lines: print(line.rstrip())
b'\x07\x07\x07\x07' b'' b'' b'' b'' b'' b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06' b'\x06\x06' b"\x93\x80\x18\x98\xc9\xdc\x8bm\x90&'\xc5U\xb18\x81\xc7y\xf0\x80\x00\x14\x1c\xceQd\x83\x13\xa0\xbf-D9\xe0\xae;\x8f\\LK\xb8\xc3\x8ae\xd4\xd1C\x10\x7f\x02\x02\xa6\x822K&D\x9a\x04\xd4\xc8\xfbC\x87\xf2\x8d\xdcN\xdes)rq\xbbI\x92\xb6\xeeu8\x1d\xfdG\xabv\xe8q\xa5\xb6\xb56\xe0\xa1\x06\x84n#\xf0\x1c\x86\xb0\x83\xee\x99\xe7\xc6\xaaN\xafY\xdf\xd9\xcfe\xd5\x84" b'\xd9\x0b\xc2\x1b0\xa1Q\x17\x88\xb4et\x81u8\xed\xf5\xe8\xd9#c\t\xf9\xc0\xa7\x06\xa2/=
Readind Binary file using Pickle
In this section, you’ll learn how to read binary files in python using the Pickle.
This is really tricky as all the types of binary files cannot be read in this mode. You may face problems while pickling a binary file. As invalid load key errors may occur.
Hence it’s not recommended to use this method.
import pickle file_to_read = open("c:\temp\Binary_File.jpg", "rb") loaded_dictionary = pickle.load(file_to_read) print(loaded_dictionary)
--------------------------------------------------------------------------- UnpicklingError Traceback (most recent call last) in 7 file_to_read = open("E:\Vikram_Blogging\Stack_Vidhya\Python_Notebooks\Read_Binary_File_Python\Binary_File.jpg", "rb") 8 ----> 9 loaded_dictionary = pickle.load(file_to_read) 10 11 print(loaded_dictionary) UnpicklingError: invalid load key, '\xff'.
Conclusion
Reading a binary file is an important functionality. For example, reading the bytes of an image file is very useful when you are working with image classification problems. In this case, you can read the image file as binary and read the bytes to create the model.
In this tutorial, you’ve learned the different methods available to read binary files in python and the different libraries available in it.
If you have any questions, feel free to comment below.