- filehash 0.2.dev1
- Навигация
- Ссылки проекта
- Статистика
- Метаданные
- Сопровождающие
- Классификаторы
- Описание проекта
- FileHash class
- Example usage
- chkfilehash command line tool
- License
- Create MD5 Hash of a file in Python
- Create MD5 hash of a file in Python
- Incorrect Way to create MD5 Hash of a file in Python
- Correct Way to create MD5 Hash of a file in Python
- MD5 Hash of Large Files in Python
- Compare and Verify MD5 hash of a file using python
filehash 0.2.dev1
Module and command-line tool that wraps around hashlib and zlib to facilitate generating checksums / hashes of files and directories.
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: MIT License (MIT)
Сопровождающие
Классификаторы
Описание проекта
Python module to facilitate calculating the checksum or hash of a file. Tested against Python 2.7.x, Python 3.6.x, Python 3.7.x, Python 3.8.x, Python 3.9.x, Python 3.10.x, PyPy 2.7.x and PyPy3 3.7.x. Currently supports Adler-32, BLAKE2b, BLAKE2s, CRC32, MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.
(Note: BLAKE2b and BLAKE2s are only supported on Python 3.6.x and later.)
FileHash class
The FileHash class wraps around the hashlib (provides hashing for MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512) and zlib (provides checksums for Adler-32 and CRC32) modules and contains the following methods:
- hash_file(filename) — Calculate the file hash for a single file. Returns a string with the hex digest.
- hash_files(filename) — Calculate the file hash for multiple files. Returns a list of tuples where each tuple contains the filename and the calculated hash.
- hash_dir(path, pattern=’*’) — Calculate the file hashes for an entire directory. Returns a list of tuples where each tuple contains the filename and the calculated hash.
- cathash_files(filenames) — Calculate a single hash for multiple files. Files are sorted by their individual hash values and then traversed in that order to generate a combined hash value. Returns a string with the hex digest.
- cathash_dir(path, pattern=’*’) — Calculate a single hash for an entire directory of files. Files are sorted by their individual hash values and then traversed in that order to generate a combined hash value. Returns a string with the hex digest.
- verify_sfv(sfv_filename) — Reads the specified SFV (Simple File Verification) file and calculates the CRC32 checksum for the files listed, comparing the calculated CRC32 checksums against the specified expected checksums. Returns a list of tuples where each tuple contains the filename and a boolean value indicating if the calculated CRC32 checksum matches the expected CRC32 checksum. To find out more about SFV files, see the Simple file verification entry in Wikipedia.
- verify_checksums(checksum_filename) — Reads the specified file and calculates the hashes for the files listed, comparing the calculated hashes against the specified expected hashes. Returns a list of tuples where each tuple contains the filename and a boolean value indicating if the calculated hash matches the expected hash.
For the checksum file, the file is expected to be a plain text file where each line has an entry formatted as follows:
This format is the format used by programs such as the sha1sum family of tools for generating checksum files. Here is an example generated by sha1sum :
f7ef3b7afaf1518032da1b832436ef3bbfd4e6f0 *lorem_ipsum.txt 03da86258449317e8834a54cf8c4d5b41e7c7128 *lorem_ipsum.zip
The FileHash constructor has two optional arguments:
- hash_algorithm=’sha256′ — Specifies the hashing algorithm to use. See filehash.SUPPORTED_ALGORITHMS for the list of supported hash / checksum algorithms. Defaults to SHA256.
- chunk_size=4096 — Integer specifying the chunk size to use (in bytes) when reading the file. This comes in useful when processing very large files to avoid having to read the entire file into memory all at once. Default chunk size is 4096 bytes.
Example usage
The library can be used as follows:
>>> import os >>> from filehash import FileHash >>> md5hasher = FileHash('md5') >>> md5hasher.hash_file("./testdata/lorem_ipsum.txt") '72f5d9e3a5fa2f2e591487ae02489388' >>> sha1hasher = FileHash('sha1') >>> sha1hasher.hash_dir("./testdata", "*.zip") [FileHashResult(filename='lorem_ipsum.zip', hash='03da86258449317e8834a54cf8c4d5b41e7c7128')] >>> sha512hasher = FileHash('sha512') >>> os.chdir("./testdata") >>> sha512hasher.verify_checksums("./hashes.sha512") [VerifyHashResult(filename='lorem_ipsum.txt', hashes_match=True), VerifyHashResult(filename='lorem_ipsum.zip', hashes_match=True)] >>> crc32hasher = FileHash('crc32') >>> crc32hasher.verify_sfv("./lorem_ipsum.sfv") [VerifyHashResult(filename='lorem_ipsum.txt', hashes_match=True), VerifyHashResult(filename='lorem_ipsum.zip', hashes_match=True)]
chkfilehash command line tool
A command-line tool called chkfilehash is also included with the filehash package. Here is an example of how the tool can be used:
$ chkfilehash -a sha512 -c hashes.sha512 lorem_ipsum.txt: OK lorem_ipsum.zip: OK $ chkfilehash -a crc32 lorem_ipsum.zip 7425D3BE *lorem_ipsum.zip $
Run the tool without any parameters or with the -h / —help switch to get a usage screen.
License
This is released under an MIT license. See the LICENSE file in this repository for more information.
Create MD5 Hash of a file in Python
As a Python enthusiast, I’m always on the lookout for handy tools and techniques that can streamline my development process. One such technique that I find particularly useful is generating MD5 hashes of files. Whether you’re ensuring data integrity, verifying file integrity during transmission, or simply looking to add an extra layer of security, MD5 hashes can be invaluable. In this blog post, I’m excited to guide you through the process of creating an MD5 hash of a file in Python. So, let’s dive in and unlock the power of file hashing!
MD5 is (atleast when it was created) a standardized 1-way function that takes in data input of any form and maps it to a fixed-size output string, irrespective of the size of the input string.
Though it is used as a cryptographygraphic hash function, it has been found to suffer from a lot of vulnerabilities.
The hash function generates the same output hash for the same input string. This means that, you can use this string to validate files or text or anything when you pass it across the network or even otherwise. MD5 can act as a stamp or for checking if the data is valid or not.
Input String | Output Hash |
---|---|
hi | 49f68a5c8493ec2c0bf489821c21fc3b |
debugpointer | d16220bc73b8c7176a3971c7f73ac8aa |
computer science is amazing! I love it. | f3c5a497380310d828cdfc1737e8e2a3 |
Check this out — If you are looking for MD5 hash of a String.
Create MD5 hash of a file in Python
MD5 hash can be created using the python’s default module hashlib .
Incorrect Way to create MD5 Hash of a file in Python
But, you have to note that you cannot create a hash of a file by just specifying the name of the file like this-
# this is NOT correct import hashlib print(hashlib.md5("filename.jpg".encode('UTF-8')).hexdigest())
03e6eda992afdeda6b2acaed17722515
The above value is NOT the MD5 hash of the file. But, it is the MD5 hash of the string filename.jpg itself.
Correct Way to create MD5 Hash of a file in Python
You have to read the contents of the file to create MD5 hash of the file itself. It’s simple, we can just read the contents of the file and create the hash.
The process of creating an MD5 hash in python is very simple. First import hashlib, then encode your string that you want to hash i.e., converts the string into the byte equivalent using encode(), then pass it through the hashlib.md5() function. We print the hexdigest value of the hash m , which is the hexadecimal equivalent encoded string.
import hashlib file_name = 'filename.jpg' with open(file_name) as f: data = f.read() md5hash = hashlib.md5(data).hexdigest()
MD5 Hash of Large Files in Python
In the above code, there is one problem. If the file is a 10 Gb file, let’s say a large log file or a dump of traffic or a Game like FIFA or others. If you want to compute MD5 hash of it, it would probably chew up your memory.
Here is a memory optimised way of computing MD5 hash, where we read chunks of 4096 bytes (can be customised as per your requirement, size of your system, size of your file etc.,). So, in this process we sequentially process the chunks and update the hash. So, in this process, let’s say there are 1000 such chunks of the file, the hash_md5 is updated 1000 times.
At the end we return the hexdigest value of the hash m , which is the hexadecimal equivalent encoded string.
import hashlib # A utility function that can be used in your code def compute_md5(file_name): hash_md5 = hashlib.md5() with open(file_name, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_md5.update(chunk) return hash_md5.hexdigest()
Compare and Verify MD5 hash of a file using python
You need to verify the MD5 hash at the server or at some other point or logic in your code.
To verify the MD5 hash you will have to create the MD5 hash of the original file again.
Then compare the original MD5 value that the source has generated and MD5 that you generate.
import hashlib file_name = 'filename.jpg' original_md5 = '5d41402abc4b2a76b9719d911017c592' with open(file_name) as f: data = f.read() md5_returned = hashlib.md5(data).hexdigest() if original_md5 == md5_returned: print "MD5 verified." else: print "MD5 verification failed."
The process of MD5 creation and verification is easy as we discussed above. Happy Coding!
NOTE : Please do not use this to hash passwords and store it in your databases, prefer SHA-256 or SHA-512 or other superior cryptographygraphic hash functions for the same.
I’m glad that you found the content useful. Happy Coding.
We’ve reached the end of our journey through the world of file hashing using MD5 in Python. I hope this exploration has empowered you with the knowledge and skills to incorporate this powerful technique into your own projects. The ability to generate MD5 hashes of files not only enhances data security but also provides a means to validate the integrity of files. As you continue your Python coding adventures, remember the importance of data integrity and the role that MD5 hashes can play in achieving it. Keep coding, keep exploring, and keep harnessing the power of Python!