Python requests get local file

Python How to Download a File from a URL

To download a file from a URL using Python, use the requests.get() method. For example, let’s download Instagram’s icon:

import requests URL = "https://instagram.com/favicon.ico" response = requests.get(URL) open("instagram.ico", "wb").write(response.content)

This is an example for someone who is looking for a quick answer. But if the above code lines don’t work or make sense, please keep on reading.

More Detailed Steps

To download a file from a URL using Python follow these three steps:

  1. Install requests module and import it to your project.
  2. Use requests.get() to download the data behind that URL.
  3. Write the file to a file in your system by calling open().

Let’s download Instagram’s icon using Python. The icon can be found behind this URL https://instagram.com/favicon.ico.

First, install the requests module by opening up a command line window and running:

Then, you can use it to download the icon behind the URL:

# 1. Import the requests library import requests URL = "https://instagram.com/favicon.ico" # 2. download the data behind the URL response = requests.get(URL) # 3. Open the response into a new file called instagram.ico open("instagram.ico", "wb").write(response.content)

As a result of running this piece of code, you see the Instagram icon appear in the same folder where your program file is.

Other Ways to Download a File in Python

There are other modules that make downloading files possible in Python.

Читайте также:  Html comments and javascript comments

In addition to the requests library, the two commonly used ones are:

How to Download a File Using wget Module

Before you can download files using wget , you need to install the wget module.

Open up a command line window and run:

Then follow these two steps to download a file:

  1. Import the wget module into your project.
  2. Use wget.download() to download a file from a specific URL and save it on your machine.

As an example, let’s get the Instagram icon using wget :

import wget URL = "https://instagram.com/favicon.ico" response = wget.download(URL, "instagram.ico")

As a result of running the code, you can see an Instagram icon appear in the folder of your program.

How to Download a File Using urllib Module in Python

Before you can download files using urllib , you need to install the module. Open up a command line window and run:

Then follow these two steps to download a file:

  1. Import the urllib module into your project.
  2. Use urllib‘s request.urlretrieve() method to download a file from a specific URL and save it on your machine.

As an example, let’s get the Instagram icon using urllib :

from urllib import request URL = "https://instagram.com/favicon.ico" response = request.urlretrieve("https://instagram.com/favicon.ico", "instagram.ico")

As a result of running the code, you can see an Instagram icon appear in the folder of your program.

Источник

Fetch a file from a local url with Python requests?

I am using Python’s requests library in one method of my application. The body of the method looks like this:

def handle_remote_file(url, **kwargs): response = requests.get(url, . ) buff = StringIO.StringIO() buff.write(response.content) . return True 

I’d like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:

class RemoteTest(TestCase): def setUp(self): self.url = 'file:///tmp/dummy.txt' def test_handle_remote_file(self): self.assertTrue(handle_remote_file(self.url)) 

When I call requests.get with a local url, I got the KeyError exception below:

requests.get('file:///tmp/dummy.txt') /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme) 76 77 # Make a fresh ConnectionPool of the desired type 78 pool_cls = pool_classes_by_scheme[scheme] 79 pool = pool_cls(host, port, **self.connection_pool_kw) 80 KeyError: 'file' 

The question is how can I pass a local url to requests.get?

PS: I made up the above example. It possibly contains many errors.

As @WooParadog explained requests library doesn’t know how to handle local files. Although, current version allows to define transport adapters.

Therefore you can simply define you own adapter which will be able to handle local files, e.g.:

from requests_testadapter import Resp class LocalFileAdapter(requests.adapters.HTTPAdapter): def build_response_from_file(self, request): file_path = request.url[7:] with open(file_path, 'rb') as file: buff = bytearray(os.path.getsize(file_path)) file.readinto(buff) resp = Resp(buff) r = self.build_response(request, resp) return r def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None): return self.build_response_from_file(request) requests_session = requests.session() requests_session.mount('file://', LocalFileAdapter()) requests_session.get('file://') 

I’m using requests-testadapter module in the above example.

Here’s a transport adapter I wrote which is more featureful than b1r3k’s and has no additional dependencies beyond Requests itself. I haven’t tested it exhaustively yet, but what I have tried seems to be bug-free.

import requests import os, sys if sys.version_info.major < 3: from urllib import url2pathname else: from urllib.request import url2pathname class LocalFileAdapter(requests.adapters.BaseAdapter): """Protocol Adapter to allow Requests to GET file:// URLs @todo: Properly handle non-empty hostname portions. """ @staticmethod def _chkpath(method, path): """Return an HTTP status for the given filesystem path.""" if method.lower() in ('put', 'delete'): return 501, "Not Implemented" # TODO elif method.lower() not in ('get', 'head'): return 405, "Method Not Allowed" elif os.path.isdir(path): return 400, "Path Not A File" elif not os.path.isfile(path): return 404, "File Not Found" elif not os.access(path, os.R_OK): return 403, "Access Denied" else: return 200, "OK" def send(self, req, **kwargs): # pylint: disable=unused-argument """Return the file specified by the given request @type req: C @todo: Should I bother filling `response.headers` and processing If-Modified-Since and friends using `os.stat`? """ path = os.path.normcase(os.path.normpath(url2pathname(req.path_url))) response = requests.Response() response.status_code, response.reason = self._chkpath(req.method, path) if response.status_code == 200 and req.method.lower() != 'head': try: response.raw = open(path, 'rb') except (OSError, IOError) as err: response.status_code = 500 response.reason = str(err) if isinstance(req.url, bytes): response.url = req.url.decode('utf-8') else: response.url = req.url response.request = req response.connection = self return response def close(self): pass 

(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k’s.) As with the other answer, follow this with:

requests_session = requests.session() requests_session.mount('file://', LocalFileAdapter()) r = requests_session.get('file:///path/to/your/file') 

packages/urllib3/poolmanager.py pretty much explains it. Requests doesn’t support local url.

pool_classes_by_scheme = < 'http': HTTPConnectionPool, 'https': HTTPSConnectionPool, > 

The easiest way seems using requests-file.

“Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs.”

This in combination with requests-html is pure magic 🙂

In a recent project, I’ve had the same issue. Since requests doesn’t support the “file” scheme, I’ll patch our code to load the content locally. First, I define a function to replace requests.get :

def local_get(self, url): "Fetch a stream from local files." p_url = six.moves.urllib.parse.urlparse(url) if p_url.scheme != 'file': raise ValueError("Expected file scheme") filename = six.moves.urllib.request.url2pathname(p_url.path) return open(filename, 'rb') 

Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:

@mock.patch('requests.get', local_get) def test_handle_remote_file(self): . 

This technique is somewhat brittle — it doesn’t help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn’t seem to be an obvious hook point, so I went with this simpler approach.

I think simple solution for this will be creating temporary http server using python and using it.

  1. Put all your files in temporary folder eg. tempFolder
  2. Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http.server 8000 (Note 8000 is port no.)
  3. This will you give you a link to http server. You can access it from http://127.0.0.1:8000/
  4. Open your desired file in browser and copy the link to your url.

Источник

Python-запросы извлекают файл с локального URL

Я использую Python requests в одном из методов моего приложения. Тело метода выглядит следующим образом:

def handle_remote_file(url, **kwargs): response = requests.get(url, . ) buff = StringIO.StringIO() buff.write(response.content) . return True 

Я хотел бы написать некоторые модульные тесты для этого метода, однако, что я хочу сделать, это передать поддельный локальный url, например:

class RemoteTest(TestCase): def setUp(self): self.url = 'file:///tmp/dummy.txt' def test_handle_remote_file(self): self.assertTrue(handle_remote_file(self.url)) 

Когда я вызываю request.get с локальным url, я получил следующее KeyError:

requests.get('file:///tmp/dummy.txt') /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme) 76 77 # Make a fresh ConnectionPool of the desired type 78 pool_cls = pool_classes_by_scheme[scheme] 79 pool = pool_cls(host, port, **self.connection_pool_kw) 80 KeyError: 'file' 

Вопрос в том, как передать локальный url в request.get? PS: Я составил вышеприведенный пример. Возможно, он содержит много ошибок.

4 ответа

Поскольку @WooParadog объясняет, что библиотека запросов не знает, как обрабатывать локальные файлы. Хотя текущая версия позволяет определить транспортные адаптеры.

Поэтому вы можете просто определить собственный адаптер, который сможет обрабатывать локальные файлы, например:

from requests_testadapter import Resp class LocalFileAdapter(requests.adapters.HTTPAdapter): def build_response_from_file(self, request): file_path = request.url[7:] with open(file_path, 'rb') as file: buff = bytearray(os.path.getsize(file_path)) file.readinto(buff) resp = Resp(buff) r = self.build_response(request, resp) return r def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None): return self.build_response_from_file(request) requests_session = requests.session() requests_session.mount('file://', LocalFileAdapter()) requests_session.get('file://') 

Я использую requests-testadapter модуль в приведенном выше примере.

Вот транспортный адаптер, который я написал более функциональный, чем b1r3k, и не имеет дополнительных зависимостей за пределами самих запросов. Я еще не тестировал его исчерпывающе, но то, что я пробовал, кажется, не содержит ошибок.

import requests import os, sys if sys.version_info.major < 3: from urllib import url2pathname else: from urllib.request import url2pathname class LocalFileAdapter(requests.adapters.BaseAdapter): """Protocol Adapter to allow Requests to GET file:// URLs @todo: Properly handle non-empty hostname portions. """ @staticmethod def _chkpath(method, path): """Return an HTTP status for the given filesystem path.""" if method.lower() in ('put', 'delete'): return 501, "Not Implemented" # TODO elif method.lower() not in ('get', 'head'): return 405, "Method Not Allowed" elif os.path.isdir(path): return 400, "Path Not A File" elif not os.path.isfile(path): return 404, "File Not Found" elif not os.access(path, os.R_OK): return 403, "Access Denied" else: return 200, "OK" def send(self, req, **kwargs): # pylint: disable=unused-argument """Return the file specified by the given request @type req: C@todo: Should I bother filling `response.headers` and processing If-Modified-Since and friends using `os.stat`? """ path = os.path.normcase(os.path.normpath(url2pathname(req.path_url))) response = requests.Response() response.status_code, response.reason = self._chkpath(req.method, path) if response.status_code == 200 and req.method.lower() != 'head': try: response.raw = open(path, 'rb') except (OSError, IOError) as err: response.status_code = 500 response.reason = str(err) if isinstance(req.url, bytes): response.url = req.url.decode('utf-8') else: response.url = req.url response.request = req response.connection = self return response def close(self): pass 

(Несмотря на название, он был полностью написан до того, как я решил проверить Google, поэтому он не имеет ничего общего с b1r3k.) Как и в случае с другим ответом, выполните следующие действия:

requests_session = requests.session() requests_session.mount('file://', LocalFileAdapter()) r = requests_session.get('file:///path/to/your/file') 

ТХ. что-то не так в строке, кроме (OSError, IOError), err :. Моя замена была исключением (OSError, IOError) как err:

@LennartRolland На момент написания статьи я использовал только запросы в Python 2.x. Я исправлю свой пост, как только смогу потратить несколько минут, чтобы проверить изменения.

Хорошая работа. Однако это не работает для локальных URL-адресов, таких как ../foo.bar . Однако было req.path_url() изменить метод send, чтобы он не использовал req.path_url() а вместо этого использовал что-то, что удаляет file:// и сохраняет остальное.

@rocky Неподдержка относительных URL является преднамеренной. На этом уровне стека любой URL, который еще не является абсолютным, недопустим, поскольку для чего-либо хорошо спроектированного, который работает на этом уровне стека, не будет контекста, чтобы знать, как относительные URL-адреса должны разрешаться. (По сути, вы должны сделать их абсолютными, прежде чем urlparse.urljoin их в запросы, используя что-то вроде urlparse.urljoin (Python 2) или urllib.parse.urljoin (Python 3).)

Источник

Оцените статью