Request python get file

Содержание

How to Download a File in Python
Download a File in Python Over HTTP
Download a File in Python From an API
Closing Thoughts on How to Download a File in Python
How to Download Files in Python
Requests Library
Making Requests
Making a GET request
Downloading files from web using Python?
1. Import module
2. Get the link or url
3. Save the content with name.
Example
Result
Get filename from an URL

How to Download a File in Python

Did you know you can download a file programmatically in Python? I will show you how to fetch and save a file in Python. This process is known as web scraping and is an essential step of any data-related project.

Web scraping is the process of collecting data from a website. While it can be done manually by a user, it usually refers to an automated method of data collection with the help of a web crawler.

You can do all of this programmatically in Python. By the end of this article, you will know how to download any kind of file in Python, including PDFs, images, videos, and pages. The process is similar between different types of files.

To get the most out of this article, it is good to have a basic understanding of programming in Python. Also, to save time and accelerate your learning, I encourage you to check our Python programming track.

To download a file in Python, we need to fetch it and save it. This process can be done by calling an API or with just a regular web URL pointing to a GIF you like.

Download a File in Python Over HTTP

In our first example, we will fetch and save a picture of a dog. This website offers random pictures of dogs you can use as placeholders for your next project. If you refresh the page, it generates another dog picture.

We will use the requests library, which makes HTTP requests simpler than using the built-in urllib library. You may have to install the requests library with the following command:

Then, we import requests , set the url variable with our target URL, write a GET request and check its status. The following are the different types of response status you may face when writing a GET request:

1xx Informational. It indicates that a request has been received and the client should continue to make requests for the data payload.
2xx Successful. It indicates a requested action has been received, understood, and accepted. It helps you verify the data exists before working on it.
3xx Redirection. It indicates the client must take additional action to complete the request, such as using a proxy or a different endpoint to access the resources.
4xx Client Error. It indicates problems with the client, for example, disallowed methods, authorization issues, forbidden access, or attempts to access resources that do not exist.
5xx Server Error. It indicates problems with the server providing the API.

Let’s write a request to fetch a file in Python.

>>> import requests >>> url = 'https://place.dog/300/200' >>> # fetch file >>> response = requests.get(url, allow_redirects=True) >>> # Get response status >>> response.status_code 200

The 200 status code indicates the request is successful and the data exists. From there, we continue to the next step and save a file in Python with the help of the write() method.

The 200 status code indicates the request is successful and the data exists. From there, we continue to the next step and save a file in Python with the help of the write() method.

Now, the file has been saved as dog1.jpg and contains a picture of a dog.

For a good refresher on the write() method to save a file in Python, check my article on how to write to file in Python here.

Download a File in Python From an API

Now, let’s explore how to fetch and save a file in Python by calling an API and parsing the JSON file. In contrast to what we have done previously, we will save the file with pathlib.

Most of the data available online are in the form of JSON (JavaScript Object Notation). It is used to store information in databases and is the most common data type you’ll find when working with modern REST APIs. JSON data structures may be unordered name-value pairs, such as dictionaries, hash tables, objects, or keyed lists depending on the programming language, or an ordered list of values such as arrays, lists, and vectors.

JSON can be difficult for humans to read and use directly. Python has different libraries to help us read the JSON data fetched from the web to resolve this problem. Among them is the JSON library with built-in support for converting JSON components into native Python objects. The following table shows the conversion mapping between JSON and Python:

JSON	Python
object	dictionary
array	List or tuple
string	string
number	Integer or float
true	True
false	False
null	None

You have to deal with JSON data often when working with REST APIs. You can find more information about JSON in our course on How to Read and Write JSON Files in Python.

The requests library has many features, but we only need the GET request and the json() formatter for the following example. As we have done previously, the first step is to import the requests library. Then, we create a GET request to the API endpoint we want to access. The API provides a response object that includes the JSON data. We are only interested in the JSON data, which is returned with the json() module.

>>> import requests >>> url = "https://randomfox.ca/floof" >>> # fetch file >>> response = requests.get(url, allow_redirects=True) >>> # get json data >>> json = response.json() >>> print(json)

The json output is similar to a Python dictionary. We extract the URL of the image as follows:

>>> img = json['image'] >>> print(img) https://randomfox.ca/images/2.jpg

Next, we want to save the image. As mentioned previously, we use pathlib , an object-oriented framework to handle filesystem paths. One of its advantages is its better portability between operating systems. You can find more information about pathlib in my article on how to rename files.

To save the picture of our fox, we will use the Path.write_bytes(data) method to open the path in binary/bytes mode and write data to it.

>>> # import Path class from pathlib >>> from pathlib import Path >>> # define filename >>> filename = Path('fox.jpg') >>> # fetch file >>> response = requests.get(img) >>> # save file >>> filename.write_bytes(response.content)

Our file has now been saved as fox.jpg . We just saw how to extract the URL in the API response by inspecting the json data.

Closing Thoughts on How to Download a File in Python

We have now learned how to download a file in Python over HTTP and from an API. I encourage you to play with the code and fetch files from different APIs.

There is a lot more to learn about JSON, which is a widespread and handy format to store data. You can find more about it and Python programming with our Python programming track.

Last but not least, it is always a good idea to reflect on your Python programming skills. To help you with this process, check out my article on Things That Can Help You Write Better Python Code and browse our content on LearnPython.com. Keep learning every day!

Источник

How to Download Files in Python

Esther Vaati Last updated Dec 29, 2022

Python provides several ways to download files from the internet. This can be done over HTTP using the urllib package or the requests library. This tutorial will discuss how to use these libraries to download files from URLs using Python.

Requests Library

The requests library is one of the most popular libraries in Python. Requests allow you to send HTTP/1.1 requests without the need to manually add query strings to your URLs, or form-encode your POST data.

With the requests library, you can perform a lot of functions including:

adding form data
adding multipart files
accessing the response data of Python

Making Requests

The first you need to do is to install the library, and it’s as simple as:

To test if the installation has been successful, you can do a very easy test in your Python interpreter by simply typing:

If the installation has been successful, there will be no errors.

Making a GET request

Making requests is very easy, as illustrated below.

req = requests.get(“https://www.google.com”)

The above command will get the google web page and store the information in the req variable. We can then go on to get other attributes as well.

For instance, to know if fetching the Google web page was successful, we will query the status_code .

req = requests.get(“https://www.google.com")

Источник

Downloading files from web using Python?

Python provides different modules like urllib, requests etc to download files from the web. I am going to use the request library of python to efficiently download files from the URLs.

Let’s start a look at step by step procedure to download files using URLs using request library−

1. Import module

2. Get the link or url

url = 'https://www.facebook.com/favicon.ico' r = requests.get(url, allow_redirects=True)

3. Save the content with name.

open('facebook.ico', 'wb').write(r.content)

save the file as facebook.ico.

Example

import requests url = 'https://www.facebook.com/favicon.ico' r = requests.get(url, allow_redirects=True) open('facebook.ico', 'wb').write(r.content)

Result

We can see the file is downloaded(icon) in our current working directory.

But we may need to download different kind of files like image, text, video etc from the web. So let’s first get the type of data the url is linking to−

>>> r = requests.get(url, allow_redirects=True) >>> print(r.headers.get('content-type')) image/png

However, there is a smarter way, which involved just fetching the headers of a url before actually downloading it. This allows us to skip downloading files which weren’t meant to be downloaded.

>>> print(is_downloadable('https://www.youtube.com/watch?v=xCglV_dqFGI')) False >>> print(is_downloadable('https://www.facebook.com/favicon.ico')) True

To restrict the download by file size, we can get the filezie from the content-length header and then do as per our requirement.

contentLength = header.get('content-length', None) if contentLength and contentLength > 2e8: # 200 mb approx return False

Get filename from an URL

To get the filename, we can parse the url. Below is a sample routine which fetches the last string after backslash(/).

url= "http://www.computersolution.tech/wp-content/uploads/2016/05/tutorialspoint-logo.png" if url.find('/'): print(url.rsplit('/', 1)[1]

Above will give the filename of the url. However, there are many cases where filename information is not present in the url for example – http://url.com/download. In such a case, we need to get the Content-Disposition header, which contains the filename information.

import requests import re def getFilename_fromCd(cd): """ Get filename from content-disposition """ if not cd: return None fname = re.findall('filename=(.+)', cd) if len(fname) == 0: return None return fname[0] url = 'http://google.com/favicon.ico' r = requests.get(url, allow_redirects=True) filename = getFilename_fromCd(r.headers.get('content-disposition')) open(filename, 'wb').write(r.content)

The above url-parsing code in conjunction with above program will give you filename from Content-Disposition header most of the time.

Источник