Содержание
- Python Get Webpage Html Examples
- 1. Python Get Webpage Html Use urllib Module Example.
- 1.1 Python urllib Module Introduction.
- 1.2 Python urllib Library request Module Introduction.
- 1.3 The http.client.HTTPResponse Class.
- 1.4 Use Python urllib.request To Crawl Web Page Examples.
- 2. Python Get Webpage Html Use urllib3 Module Example.
- 2.1 How To Install Python urllib3 Module.
- Get Web Page in Python
- Use the urllib Package to Get a Web Page in Python
- Use the requests Package to Get a Webpage in Python
- Related Article — Python Web
Python Get Webpage Html Examples
Python provides some modules for you to get webpage Html source code from a URL. It includes the modules urllib ( urllib2 is not supported in python3 ), urllib3, and request. This article will show you how to use these python modules to get webpage Html source code with examples.
1. Python Get Webpage Html Use urllib Module Example.
1.1 Python urllib Module Introduction.
- Python’s built-in urllib library is used to obtain the HTML source code of web pages.
- The urllib library is a standard library module of Python and does not need to be installed separately.
1.2 Python urllib Library request Module Introduction.
- Before you can use the urllib library request module, you need to import it into your source code.
# import the urllib.request module import urllib.request # or use the below method to import the request module from the urllib library from urllib import request
url: the requested web page URL. timeout: response timeout. If no response is received within the specified time, a timeout exception will be thrown
url: the request web page URL. headers: the request headers.
1.3 The http.client.HTTPResponse Class.
- All the above urllib.request module’s methods will return an http.client.HTTPResponse object. Below will introduce it’s methods.
- read(): read the bytes data from the response object.
- bytes.decode(“utf-8”) : convert the bytes data to string data.
- string.encode(“utf-8”): convert string data to bytes data.
- geturl(): return the URL address of the response object.
- getcode(): return the HTTP response code.
1.4 Use Python urllib.request To Crawl Web Page Examples.
- This example will show you how to use python urllib.request module to request a web page by URL and how to get webpage html content and headers.
import urllib.request # or # from urllib import request # this function will request the url page and get the response object. def urllib_request_web_page(url): # send request to the url web page and get the response object. response = urllib.request.urlopen(url) # print out the response object. print(response) # get the response url. resp_url = response.geturl() print('Response url : ', resp_url) # get the response code. resp_code = response.getcode() print('Response code : ', resp_code) # get all the response headers in a list object. resp_headers_list = response.getheaders() # loop in the response headers. for resp_headers in resp_headers_list: # get the response header name header_name = resp_headers[0] # get the response header value. header_value = resp_headers[1] print(resp_headers) print(header_name, ' = ', header_value) # read the response content in bytes object. bytes = response.read() print(bytes) # convert the bytes object to string. html_content = bytes.decode('utf-8') print(html_content) if __name__ == '__main__': url = "https://www.bing.com" urllib_request_web_page(url)
Response url : https://www.bing.com Response code : 200 ('Cache-Control', 'private') Cache-Control = private ('Transfer-Encoding', 'chunked') Transfer-Encoding = chunked . . b'.