- Force file download instead of opening in browser Using HTTP Header and Flask
- The story of Content-Disposition:
- Add Content-Disposition HTTP Header in Flask
- HTML5 Attribute Alternative
- Downloading Files In Python Using Requests Module
- Introduction
- Not all URLs pointing to downloadable resources
- Define a function to verify a downloadable resource
- Checking Content-Type of the request header
- Restricting the file size of the downloading resource
- Getting the file name from the URL
- Leave a Reply Cancel reply
- rfc6266-content-disposition 0.0.6
- Usage
- Receiver
- Sender
- Security
- Testing
- References
- Saved searches
- Use saved searches to filter your results more quickly
- License
- Licenses found
- g2p/rfc6266
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.rst
- About
- Resources
- License
Force file download instead of opening in browser Using HTTP Header and Flask
HTTP header fields are used between browser and web server for communication. It specify configurations and cookies that lay a foundation for the modern Internet.
An example of Request Header field: «user-agent» (what Operation System and Browser Version the client is using).
user-agent: Mozilla/5.0 (Linux; Android 6.0.1; SM-G532G Build/MMB29T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.83 Mobile Safari/537.36 user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36 user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7
An example of Response Header field: «cache-control» (The cache policy).
cache-control: private, max-age=0 cache-control: public, max-age=30672000
The story of Content-Disposition:
There are 2 main methods for modern browser such as Chrome and Firefox to handle HTTP Response, show the content inline in the browser (open in browser), or download it as an attachment and saved at your laptop’s local disk. In order to tell how browser showed deal with the data, web server needs to add «Content-Disposition» field in the HTTP response header.
if «Content-Disposition» field is not specified, «inline» will be used as default (open in browser), which is equal to:
Content-Disposition: inline
To force file download, specify «Content-Disposition» HTTP Header field as below:
Content-Disposition: attachment
To rewrite the attachment’s name, specify filename within «Content-Disposition». if filename is not specified, the original file name will be used.
Content-Disposition: attachment; filename="test.txt"
Add Content-Disposition HTTP Header in Flask
If you are using Flask as your website project’s framework, It’s easy to set up any HTTP Header field. You can set a rang of URL with ‘/download/’ prefix. Then add Content-Disposition HTTP header to all HTTP Responses!
@app.after_request def after_request(response): if str(request.path).startswith('/download/'): response.headers['Content-Disposition'] = 'attachment' return response
HTML5 Attribute Alternative
There is a new attribute in HTML5 specification, by adding download tag to any link to achieve force download.
Downloading Files In Python Using Requests Module
This post aims to present you how to download a resource from the given URL using the requests module. Of course, there are other modules which allow you to accomplish this purpose but I just focus on explaining how to do with the requests module and leave you discovering the other methods. Let’s get started now.
Introduction
Below is a simple snippet to download Google’s logo in the Google search page via the link https://www.google.co.uk/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png
import requests url = "https://www.google.co.uk/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" r = requests.get(url, allow_redirects=True) open("google.ico", "wb").write(r.content)
The file named google.ico is saved into the current working directory. It’s easy as a piece of cake, right? In practice, we have to face more difficult situations that I am gonna show you now.
Not all URLs pointing to downloadable resources
The real world is you almost certainly handle circumstances where the resources in downloading are protected not allow users to download. For example, Youtube videos have been secured to prevent users from greedily downloading. People developers browser extensions or standalone applications to download Youtube videos, however, Google has detected such violent activities and increasingly protected their data. Therefore, it is important to check whether the resource of interest is allowed to download or not before sending a request. A snippet below simulates how to check that based on the Content-Type parameter of the header of the requesting URL.
import requests def extract_content_type(_url): r = requests.get(_url, allow_redirects=True) return r.headers.get("Content-Type") url = "https://www.google.co.uk/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" # open("google.ico", "wb").write(r.content) print(extract_content_type(url)) url = "https://www.youtube.com/watch?v=ylk5AYyOcGI" print(extract_content_type(url))
The output of the script above looks like
image/png text/html; charset=utf-8
The extract_content_type function returns a string as the mime type of the remote file. In the above example, what we are expecting from the Youtube URL is a video type rather than text/html while the first URL returns an expected value. In other words, the content type of a request is text/html which we just download a plain text or HTML document instead of well-known mime types such as image/png, video/mp4, etc.
Define a function to verify a downloadable resource
As explained in the previous section, checking a resource allowed to download is necessary before sending a request.
Checking Content-Type of the request header
The function below can do what we need by checking the content type from the header.
def is_downloadable(_url): """ Does the url contain a downloadable resource """ h = requests.head(_url, allow_redirects=True) header = h.headers content_type = header.get('content-type') if 'text' in content_type.lower(): return False if 'html' in content_type.lower(): return False return True
Applying this function for the two URLs in the previous examples, it returns False for Youtube URL while True is returned with Google’s icon link.
Restricting the file size of the downloading resource
We might have another restriction on the downloading resource, for example, just downloading the file which the size is not greater than 100 MB. By inspecting the header of the request URL on the content-length property, the code below can work as expected.
content_length = header.get('content-length', None) if content_length and content_length > 1e8: # 100 MB approx return False
Getting the file name from the URL
Again, to obtain the file name of the downloading resource, we can use the Content-Disposition property of the request header.
def get_filename_from_url(_url): """ Get filename from content-disposition """ r = requests.get(_url, allow_redirects=True) cd = r.headers.get('content-disposition') if not cd: return None filename = re.findall('filename=(.+)', cd) if len(filename) == 0: return None return filename[0]
The URL-parsing code in conjunction with the above method to get filename from the Content-Disposition header will work for most of the cases.
Voilà! If you have any judgments, please don’t hesitate to leave your comments in the comment box below.
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
rfc6266-content-disposition 0.0.6
This module parses and generates HTTP Content-Disposition headers. These headers are used when getting resources for download; they provide a hint of whether the file should be downloaded, and of what filename to use when saving.
Usage
Receiver
parse_headers builds a ContentDisposition object from the Content-Disposition header and (as a fallback) the document location. Shortcuts work with response objects from httplib2 and the requests library.
Important attributes of ContentDisposition are is_inline , filename_unsafe , filename_sanitized .
Sender
build_header builds a header value from a filename.
Security
The Content-Disposition filename should be used with caution. Do not let the sender overwrite an arbitrary filesystem location, pick arbitrary extensions or filenames with special meaning, pick filenames containing unusual or misleading characters, etc. Read RFC 6266 section 4.3 for more details.
Testing
To test in the current Python implementation:
To test compatibility across Python releases:
rfc6266 is currently tested under Python 2.7, Python 2.6, Python 3.3, Python 3.2, and PyPy (1.7).
References
- RFC 6266 specifies the Content-Disposition header
- RFC 5987 specifies a way to encode non-ascii filenames
- TC 2231 is a test suite for Content-Disposition headers
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Content-Disposition header support for Python
License
LGPL-3.0, GPL-3.0 licenses found
Licenses found
g2p/rfc6266
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.rst
This module parses and generates HTTP Content-Disposition headers. These headers are used when getting resources for download; they provide a hint of whether the file should be downloaded, and of what filename to use when saving.
parse_headers builds a ContentDisposition object from the Content-Disposition header and (as a fallback) the document location. Shortcuts work with response objects from httplib2 and the requests library.
Important attributes of ContentDisposition are is_inline , filename_unsafe , filename_sanitized .
build_header builds a header value from a filename.
The Content-Disposition filename should be used with caution. Do not let the sender overwrite an arbitrary filesystem location, pick arbitrary extensions or filenames with special meaning, pick filenames containing unusual or misleading characters, etc. Read RFC 6266 section 4.3 for more details.
To test in the current Python implementation:
To test compatibility across Python releases:
rfc6266 is currently tested under Python 2.7, Python 2.6, Python 3.3, Python 3.2, and PyPy (1.7).
- RFC 6266 specifies the Content-Disposition header
- RFC 5987 specifies a way to encode non-ascii filenames
- TC 2231 is a test suite for Content-Disposition headers
About
Content-Disposition header support for Python
Resources
License
LGPL-3.0, GPL-3.0 licenses found