How to find the mime type of a file in python?
Let’s say you want to save a bunch of files somewhere, for instance in BLOBs. Let’s say you want to dish these files out via a web page and have the client automatically open the correct application/viewer. Assumption: The browser figures out which application/viewer to use by the mime-type (content-type?) header in the HTTP response. Based on that assumption, in addition to the bytes of the file, you also want to save the MIME type. How would you find the MIME type of a file? I’m currently on a Mac, but this should also work on Windows. Does the browser add this information when posting the file to the web page? Is there a neat python library for finding this information? A WebService or (even better) a downloadable database?
19 Answers 19
The python-magic method suggested by toivotuo is outdated. Python-magic’s current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.
# For MIME types import magic mime = magic.Magic(mime=True) mime.from_file("testdata/test.pdf") # 'application/pdf'
thanks for the comment! please note, that «above» is a difficult concept in stackoverflow, since the ordering is grouped by votes and ordered randomly inside the groups. I am guessing you refer to @toivotuo’s answer.
Yeh, I didn\t have enough «points» to create comments at the time of writing this reply. But I probably should have written it as a comment, so that the @toivotuo could have edited his question.
rpm -qf /usr/lib/python2.7/site-packages/magic.py -i URL : darwinsys.com/file Summary : Python bindings for the libmagic API rpm -qf /usr/bin/file -i Name : file URL : darwinsys.com/file python-magic from darwinsys.com/file and which comes with Linux Fedora works like @toivotuo’s said. And seems more main stream.
Beware that the debian/ubuntu package called python-magic is different to the pip package of the same name. Both are import magic but have incompatible contents. See stackoverflow.com/a/16203777/3189 for more.
As I commented on toivotuo’s answer, it is not outdated! You are talking about a different library. Can you please remove or replace that statement in your answer? It currently makes finding the best solution really difficult.
The mimetypes module in the standard library will determine/guess the MIME type from a file extension.
If users are uploading files the HTTP post will contain the MIME type of the file alongside the data. For example, Django makes this data available as an attribute of the UploadedFile object.
While @cerin is right that file extensions are not reliable, I’ve just discovered that the accuracy of python-magic (as suggested in the top answer) to be even lower, as confirmed by github.com/s3tools/s3cmd/issues/198. So, mimetypes seems a better candidate for me.
This seems to be very easy
>>> from mimetypes import MimeTypes >>> import urllib >>> mime = MimeTypes() >>> url = urllib.pathname2url('Upload.xml') >>> mime_type = mime.guess_type(url) >>> print mime_type ('application/xml', None)
Update — In python 3+ version, it’s more convenient now:
import mimetypes print(mimetypes.guess_type("sample.html"))
for Python 3.X replace import urllib with from urllib import request. And then use «request» instead of urllib
More reliable way than to use the mimetypes library would be to use the python-magic package.
import magic m = magic.open(magic.MAGIC_MIME) m.load() m.file("/tmp/document.pdf")
This would be equivalent to using file(1).
On Django one could also make sure that the MIME type matches that of UploadedFile.content_type.
@DarenThomas: As mentioned in mammadori’s answer, this answer is not outdated and distinct from Simon Zimmermann’s solution. If you have the file utility installed, you can probably use this solution. It works for me with file-5.32. On gentoo you also have to have the python USE-flag enabled for the file package.
13 year later.
Most of the answers on this page for python 3 were either outdated or incomplete.
To get the mime type of a file I use:
import mimetypes mt = mimetypes.guess_type("https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf") if mt: print("Mime Type:", mt[0]) else: print("Cannot determine Mime Type") # Mime Type: application/pdf
mimetypes.guess_type (url, strict=True)
Guess the type of a file based on its filename, path or URL, given by url. URL can be a string or a path-like object.
The return value is a tuple (type, encoding) where type is None if the type can’t be guessed (missing or unknown suffix) or a string of the form ‘type/subtype’ , usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to encode (e.g. compress or gzip). The encoding is suitable for use as a Content-Encoding header, not as a Content-Transfer-Encoding header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively.
The optional strict argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When strict is True (the default), only the IANA types are supported; when strict is False , some additional non-standard but commonly used MIME types are also recognized.
Changed in version 3.8: Added support for url being a path-like object.
filetype 1.2.0
Infer file type and MIME type of any file/buffer. No external dependencies.
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: MIT License (MIT)
Метки file, libmagic, magic, infer, numbers, magicnumbers, discovery, mime, type, kind
Сопровождающие
Классификаторы
Описание проекта
Small and dependency free Python package to infer file type and MIME type checking the magic numbers signature of a file or buffer.
This is a Python port from filetype Go package.
Features
- Simple and friendly API
- Supports a wide range of file types
- Provides file extension and MIME type inference
- File discovery by extension or MIME type
- File discovery by kind (image, video, audio…)
- Pluggable: add new custom type matchers
- Fast, even processing large files
- Only first 261 bytes representing the max file header is required, so you can just pass a list of bytes
- Dependency free (just Python code, no C extensions, no libmagic bindings)
- Cross-platform file recognition
Installation
API
Examples
Simple file type checking
Supported types
Image
Video
Audio
Archive
- br — application/x-brotli
- rpm — application/x-rpm
- dcm — application/dicom
- epub — application/epub+zip
- zip — application/zip
- tar — application/x-tar
- rar — application/x-rar-compressed
- gz — application/gzip
- bz2 — application/x-bzip2
- 7z — application/x-7z-compressed
- xz — application/x-xz
- pdf — application/pdf
- exe — application/x-msdownload
- swf — application/x-shockwave-flash
- rtf — application/rtf
- eot — application/octet-stream
- ps — application/postscript
- sqlite — application/x-sqlite3
- nes — application/x-nintendo-nes-rom
- crx — application/x-google-chrome-extension
- cab — application/vnd.ms-cab-compressed
- deb — application/x-deb
- ar — application/x-unix-archive
- Z — application/x-compress
- lzo — application/x-lzop
- lz — application/x-lzip
- lz4 — application/x-lz4
- zstd — application/zstd
Document
- doc — application/msword
- docx — application/vnd.openxmlformats-officedocument.wordprocessingml.document
- odt — application/vnd.oasis.opendocument.text
- xls — application/vnd.ms-excel
- xlsx — application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- ods — application/vnd.oasis.opendocument.spreadsheet
- ppt — application/vnd.ms-powerpoint
- pptx — application/vnd.openxmlformats-officedocument.presentationml.presentation
- odp — application/vnd.oasis.opendocument.presentation
Font
- woff — application/font-woff
- woff2 — application/font-woff
- ttf — application/font-sfnt
- otf — application/font-sfnt
Определение расширения файла в python
Как в python сделать максимально правильно проверку на расширения файла ? Скажем, ко мне попадает имя файла и я хочу сделать проверку на то, является ли его расширения .py или нет.
Если надо определить тип файла по содержимому, то используйте libmagic и какую-нибудь питоновскую обертку на выбор.
6 ответов 6
>>> import os >>> filename, file_extension = os.path.splitext('/path/to/somefile.ext') >>> filename '/path/to/somefile' >>> file_extension '.ext'
ко мне попадает имя файла и я хочу сделать проверку на то, является ли его расширения .py
filename.endswith(‘.py’) метод возвращает оканчивается ли filename (строка, содержащая имя файла) на ‘.py’ .
>>> from pathlib import Path >>> Path('my/library/setup.py').suffix == '.py' True
Если нужно найти все расширения, то можно .suffixes атрибут использовать:
>>> Path('my/library.tar.gz').suffixes ['.tar', '.gz']
«максимально правильно» использовать самый простой читаемый код, который работает.
Существенным отличием является случай, когда имя директории задано со слешом на конце, тогда поведение pathlib.Path отличается от os.path.splitext() или str.endswith() :
>>> import os >>> os.path.splitext('pypy/rlib/rsre__gen.py/')[1] == '.py' False # not True. >>> 'pypy/rlib/rsre__gen.py/'.endswith('.py') False # not True. >>> Path('pypy/rlib/rsre__gen.py/').suffix == '.py' True
>>> Path(‘Я.и.моя.самая любимая.кошка.jpg.wtf.tar.gz’).suffixes => [‘.и’, ‘.моя’, ‘.самая любимая’, ‘.кошка’, ‘.jpg’, ‘.wtf’, ‘.tar’, ‘.gz’] — чёт мда)
Тут надо уточнить, что pathlib в стандартной поставке python появилась только с версии 3.3. Может быть установлена отдельным пакетом
Можно использовать библиотеку magic :
import magic mime = magic.Magic(mime=True) mime.from_file("testdata/test.pf") # 'application/pdf' magic.from_file('iceland.jpg') # 'JPEG image data, JFIF standard 1.01' magic.from_file('iceland.jpg', mime=True) # 'image/jpeg' magic.from_file('greenland.png') # 'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced' magic.from_file('greenland.png', mime=True)
у вас ответ про mime-тип содержимого файла, а в вопросе речь всего лишь о суффиксе в имени файла.
Вопрос является ли его расширения . , и я сам попал на него с гугла с запросом определить расширение файла, а эта либа помогает определить тип файла. Так что мне кажется ответ к спеху
extension = '/path/to/somefile.ext'.split('.')[-1] print(extension)
Быстрее всего без всяких массивов и суффиксов:
ext = somefilename[somefilename.rfind(".") + 1:]
#Слишком вручную, но это то что сразу приходит на ум import os t=input('Введите путь к папке: ') docs,music,other=[],[],[] for root, dirs, files in os.walk(t): for file in files: if file[-4::]=='.doc' or file[-5::]=='.docx': docs.append(os.path.join( file)) elif file[-4::]=='.mp3' or file[-4::]=='.wav': music.append(os.path.join( file)) #Сюда вставляем любые форматы else: other.append(os.path.join( file)) print('\nДОКУМЕНТЫ:\n') for el in docs: print(el) print('\nМУЗЫКА:\n') for el in music: print(el) print('\nДРУГОЕ:\n') for el in other: print(el)
Лучше так не делать. Одно неосторожное движение, т.е. один неправильно посчитанный индекс (например, если случайно написать file[-4::]==’.docx’ вместо -5), и расширение не определится. Если уж знаете про os.path , то имеет смысл использовать os.path.splitext из принятого ответа.
Если как строку проверять, то лучше уж через endswith: if file.endswith(‘.doc’) or file.endswith(‘.docx’):