Python requests file size

Get file size using python-requests, while only getting the header

A HEAD request is like a GET request that only downloads the headers. Note that it’s up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you’ll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.

use requests.get(url, stream=True).headers[‘Content-length’]

stream=True means when function returns, only the response header is downloaded, response body is not.

Both requests.get and request.head can get you headers but there’s an advantage of using get

  1. get is more flexible, if you want to download the response body after inspecting the length, you can start by simply access the content property or using an iterator which will download the content in chunks
  2. «HEAD request SHOULD be identical to the information sent in response to a GET request.» but its not always the case.

here is an example of getting the length of a MIT open course video

MitOpenCourseUrl = "http://www.archive.org/download/MIT6.006F11/MIT6_006F11_lec01_300k.mp4" resHead = requests.head(MitOpenCourseUrl) resGet = requests.get(MitOpenCourseUrl,stream=True) resHead.headers['Content-length'] # output 169 resGet.headers['Content-length'] # output 121291539 

Источник

Читайте также:  Java xml annotations examples

Get the size of a file from URL in Python

In this tutorial, we will learn how to get the size of a file from URL in python. Before getting on to the actual code, let us see some prerequisites for the same.

If you want to get the size of a file of your local storage you can follow this one – How to get the size of a file in Python

The urllib module

The urllib module is used to access and handle URL (Uniform Resource Locator) related data. Opening the URL, accessing, retrieving and downloading data, etc are some of the functions of urllib. In this tutorial, we will use urllib.request module to access file data. This module has predefined classes and functions which are needed for URL operations. One of the functions is urlopen(). As the name suggests, it opens the URL and fetches data. To access urllib.request, simply import it.

Requests module

Another method to go by this problem is to use the requests module. It is one of the most famous, easy-to-use, third-party libraries in python and it is used to make all kinds of HTTP/1.1 requests. To get started with this module, install it using:

And then import it in your code.

The head() method requests the URL to give access to the header details of the file. This is very useful when you only need the status and basic details of the file and not it’s contents.

Getting the size of a file from URL

Problem statement: Write a python program to get the size of a file from URL.

Steps/Algorithm:

  1. Import the urllib module.
  2. Paste the required URL.
  3. Get the size of the file using .length function.

Program/Code:

import urllib.request #importing the module file = urllib.request.urlopen("https://speed.hetzner.de/100MB.bin") #just a dummy file print(file.length) #fetching its length

Python returns the size of the file in bytes.

Steps/Algorithm:

  1. Import the requests module.
  2. Paste the URL.
  3. Get the header details.
  4. Print it.

Program/Code:

import requests #importing the requests module url = "https://speed.hetzner.de/100MB.bin" #just a dummy file URL info = requests.head(url) #fetching the header information print(info.headers) #printing the details

The ‘Content-Length’ gives the size of the file in bytes.

One response to “Get the size of a file from URL in Python”

Thank you for the info. I have another thing to know that… Without download the content from a url, I want to know the count of .txt file, .jpeg file, .csv file, etc

Источник

Get file size using python-requests, while only getting the header

I have looked at the requests documentation, but I can’t seem to find anything. How do I only request the header, so I can assess filesize?

Answers

>>> import requests >>> response = requests.head('http://example.com') >>> response.headers 

A HEAD request is like a GET request that only downloads the headers. Note that it’s up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you’ll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.

To get the resultant URL after you’ve been redirected, you can do r.url .

r = requests.get('https://youtu.be/dQw4w9WgXcQ') print(r.url) # https://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtu.be 

r.history is for URLs prior to the final one, so it’s only returning your original URL because you were only redirected once.

It looks like the server receives your requests and acts upon them but fails to respond in time (3s is a pretty low timeout, a load spike/paging operation can easily make the server miss it unless it employs special measures). I’d suggest to

  • process requests asynchronously (e.g. spawn threads; Asynchronous Requests with Python requests discusses ways to do this with requests ) and do not use timeouts (TCP has its own timeouts, let it fail instead).
  • reuse the connection(s) (TCP has quite a bit of overhead for connection establishing/breaking) or use UDP instead.
  • include some «hints» (IDs, timestamps etc.) to prevent the server from adding duplicate records. (I’d call this one a workaround as the real problem is you’re not making sure if your request was processed.)

From the server side, you may want to:

  • Respond ASAP and act upon the info later. Do not let pending action prevent answering further requests.

The code in the question executes all POST requests in a series, making the code no faster than if you used requests in a single thread. But unlike requests , asyncio makes it possible to parallelize them in the same thread:

async def make_account(): url = "https://example.com/sign_up.php" async with aiohttp.ClientSession() as session: post_tasks = [] # prepare the coroutines that post async for x in make_numbers(35691, 5000000): post_tasks.append(do_post(session, url, x)) # now execute them all at once await asyncio.gather(*post_tasks) async def do_post(session, url, x): async with session.post(url, data =< "terms": 1, "captcha": 1, "email": "user%[email protected]" % str(x), "full_name": "user%s" % str(x), "password": "123456", "username": "auser%s" % str(x) >) as response: data = await response.text() print("-> Created account number %d" % x) print (data) 

The above code will attempt to send all the POST requests at once. Despite the intention, it will be throttled by aiohttp.ClientSession ‘s TCP connector which allows a maximum of 100 simultaneous connections by default. To increase or remove this limitation, you must set a custom connector on the session.

Источник

Запросы Python: получение размера (в байтах) запрошенного файла (mp4)

В настоящее время я пытаюсь загрузить видео, используя запросы Python, и сначала хочу узнать его размер.

import requests print("STARTING PROGRAM. ") req = requests.get("https://www.source.com/source.mp4") 
for chunk in req.iter_content(): count+=1 print("FOUND %d CHUNKS" %(count)) 

Но это заняло довольно много времени, так как я скачиваю 24-минутный mp4. Есть лучший способ сделать это?

Но это заняло довольно много времени, так как я загружаю 24-минутный mp4. Вы можете найти stackoverflow.com/questions/16694907/ полезно. — person AMC &nbsp schedule 01.05.2020

@AMC Не настоящая проблема, которую я хочу здесь решить, но спасибо за ссылку. — person Cyh1368 &nbsp schedule 01.05.2020

@ Cyh1368 Мне любопытно, в чем настоящая проблема? Однако ответы на этот вопрос все же уместны, не так ли? — person AMC &nbsp schedule 01.05.2020

@AMD Проблема здесь в том, чтобы найти способ получить размер запроса, хотя косвенно может помочь и закрепление процесса. — person Cyh1368 &nbsp schedule 02.05.2020

@danill Я никогда особо не задумываюсь, но в этом есть смысл. — person Cyh1368 &nbsp schedule 02.05.2020

Ответы (2)

import requests response = requests.head("https://www.source.com/source.mp4") print(response.headers) 

Затем вы должны получить что-то под названием content-length , которое вам и нужно. Или, как вариант, просто распечатайте размер:

print(response.headers['content-length']) 
import requests response = requests.head('https://www.source.com/source.mp4') size = response.headers['content-length'] 

Некоторые веб-сайты не устанавливают это значение. Вы можете увидеть это, когда ваш браузер просто выполняет цикл вместо правильного индикатора выполнения. Остерегайтесь, вы не можете на 100% доверять заголовкам. — person Mies van der Lippe; 01.05.2020

Я не понимаю, чем это лучше, чем мой ответ, учитывая, что я писал перед вами . — person Xnero; 01.05.2020

Источник

Оцените статью