Python url file size

Содержание

Как я могу получить размер файла по ссылке, не загружая его в python?
Ответы 3
Python Adventures
kissgyorgy / download_file.py

Как я могу получить размер файла по ссылке, не загружая его в python?

У меня есть список ссылок, размер которых я пытаюсь получить, чтобы определить, сколько вычислительных ресурсов требуется каждому файлу. Можно ли просто получить размер файла с помощью запроса на получение или чего-то подобного?

Оператор pass в Python — это простая концепция, которую могут быстро освоить даже новички без опыта программирования.

Python — самый известный и самый простой в изучении язык в наши дни. Имея широкий спектр применения в области машинного обучения, Data Science.

Как веб-разработчик, Python может стать мощным инструментом для создания эффективных и масштабируемых веб-приложений.

Ответы 3

Если вы используете Python 3, вы можете сделать это, используя urlopen из urllib.request :

from urllib.request import urlopen link = "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887" site = urlopen(link) meta = site.info() print(meta)

Server: nginx Date: Mon, 18 Mar 2019 17:02:40 GMT Content-Type: application/octet-stream Content-Length: 578220087 Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT Connection: close Accept-Ranges: bytes

Свойство Content-Length — это размер вашего файла в байтах.

Вам нужно использовать метод HEAD . В примере используется requests ( pip install requests ).

#!/usr/bin/env python # display URL file size without downloading import sys import requests # pass URL as first argument response = requests.head(sys.argv[1], allow_redirects=True) size = response.headers.get('content-length', -1) # size in megabytes (Python 2, 3) print(': MB'.format('FILE SIZE', int(size) / float(1 : MB")

Также см. Как вы отправляете HTTP-запрос HEAD в Python 2?, если вам нужно решение на основе стандартной библиотеки.

Для этого используйте метод HTTP HEAD, который просто захватывает информацию заголовка для URL-адреса и не загружает содержимое, как это делает запрос HTTP GET.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887 HTTP/1.1 200 OK Server: nginx Date: Mon, 18 Mar 2019 16:56:35 GMT Content-Type: application/octet-stream Content-Length: 578220087 Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT Connection: keep-alive Accept-Ranges: bytes

Размер файла указан в заголовке Content-Length. В Питоне 3.6:

>>> import urllib >>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887', method='HEAD') >>> f = urllib.request.urlopen(req) >>> f.status 200 >>> f.headers['Content-Length'] '578220087'

обратите внимание, что если удаленный сервер не реализует заголовок, вы все равно можете добиться чего-то подобного, используя параметр stream = True с библиотекой запросов python, как в stackoverflow.com/a/44299915, а затем закрывая каждый запрос сразу после того, как вы получили их заголовки.

Источник

Python Adventures

You have a URL and you want to get some info about it. For instance, you want to figure out the content type (text/html, image/jpeg, etc.) of the URL, or the file size without actually downloading the given page.

#!/usr/bin/env python import urllib def get_url_info(url): d = urllib.urlopen(url) return d.info() url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg' print get_url_info(url)

Output:

Date: Mon, 18 Oct 2010 18:58:07 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6
X-Powered-By: Zope (www.zope.org), Python (www.python.org)
Last-Modified: Thu, 08 Nov 2007 09:56:19 GMT
Content-Length: 103984
Accept-Ranges: bytes
Connection: close
Content-Type: image/jpeg

That is, the size of the image is 103,984 bytes and its content type is indeed image/jpeg.

In the code d.info() is a dictionary, so the extraction of a specific field is very easy:

#!/usr/bin/env python import urllib def get_content_type(url): d = urllib.urlopen(url) return d.info()['Content-Type'] url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg' print get_content_type(url) # image/jpeg

Update (20121202)

>>> import requests >>> from pprint import pprint >>> url = 'http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg' >>> r = requests.head(url) >>> pprint(r.headers)

Источник

kissgyorgy / download_file.py

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

import requests

r = requests . get ( «http://download.thinkbroadband.com/10MB.zip» ,

cookies =

)

open ( ’10MB.zip’ , ‘wb’ ). write ( r . content )

import urllib2

url = «http://download.thinkbroadband.com/10MB.zip»

file_name = url . split ( ‘/’ )[ — 1 ]

u = urllib2 . urlopen ( url )

f = open ( file_name , ‘wb’ )

meta = u . info ()

file_size = int ( meta . getheaders ( «Content-Length» )[ 0 ])

print «Downloading: %s Bytes: %s» % ( file_name , file_size )

file_size_dl = 0

block_sz = 8192

while True :

buffer = u . read ( block_sz )

if not buffer :

break

file_size_dl += len ( buffer )

f . write ( buffer )

status = r»%10d [%3.2f%%]» % ( file_size_dl , file_size_dl * 100. / file_size )

status = status + chr ( 8 ) * ( len ( status ) + 1 )

print status ,

f . close ()

Источник