Google

How to Get an HTML Page from a URL in Python?

Be on the Right Side of Change

This tutorial shows you how to perform simple HTTP get requests to get an HTML page from a given URL in Python!

Problem Formulation

Given a URL as a string. How to extract the HTML from the given URL and store the result in a Python string variable?

Example: Say, you want to accomplish the following:

url = 'https://google.com' # . Code to extract HTML page here . print(result) # . Google HTML file: ''' . '''

Let’s study the four most important methods to access a website in your Python script!

Method 1: requests.get(url)

The simplest solution is the following:

import requests print(requests.get(url = 'https://google.com').text)
  • Import the Python library requests that handles the details of requesting the websites from the server in an easy-to-process format.
  • Use the requests.get(. ) method to access the website and pass the URL ‘https://google.com’ as an argument so that the function knows which location to access.
  • Access the actual body of the get request (the return value is a request object that also contains some useful meta information like the file type, etc.).
  • Print the result to the shell.

The output is the desired Google website:

Note that you may have to install the requests library with the following command in your operating system terminal:

Читайте также:  Ktk40 ru index php ru

Method 2: One-Liner with requests.get()

Sometimes you don’t want to open an interactive Python session to access the URL. No problem, you can make the previous solution a one-liner and run it from your operating system command line or terminal.

Note that the semicolon is used to one-linerize the previously discussed method. This is useful if you want to run this command from your operating system with the following command:

python -r "import requests; print(requests.get(url = 'https://google.com').text)"

The output, again, is the desired Google HTML page:

Method 3: urllib.request

A recommended way to fetch web resources from a website is the urllib.request() function. This also works to create a simple one-liner to access the Google website in Python 3 as before:

import urllib.request as r page = r.urlopen('https://google.com') print(page.read())

Again, you return a Request object that can be accessed to read the server’s response.

Note that this reads the file as a byte string. If you want to read the HTML file as a string, you need to convert the result using Python’s decode() method:

import urllib.request as r page = r.urlopen('https://google.com') print(page.read().decode('utf8'))

Here’s the output of this code snippet with most of the HTML content omitted for brevity.

Method 4: One-Liner with urllib.request

You can also cram everything into a single line so that you can run it from your OS’s terminal:

python -r "import urllib.request as r; print(r.urlopen('https://google.com').read())"

Try It Yourself

You can try Methods 1 and 3 yourself in our interactive Juypter notebook with your own desired website URL:

How to Get an HTML Page from a URL in Python? Interactive Shell

To boost your skills in Python, feel free to check out the world’s most comprehensive Python email academy and download your Python cheat sheets here:

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

Be on the Right Side of Change 🚀

  • The world is changing exponentially. Disruptive technologies such as AI, crypto, and automation eliminate entire industries. 🤖
  • Do you feel uncertain and afraid of being replaced by machines, leaving you without money, purpose, or value? Fear not! There a way to not merely survive but thrive in this new world!
  • Finxter is here to help you stay ahead of the curve, so you can keep winning as paradigms shift.

Learning Resources 🧑‍💻

⭐ Boost your skills. Join our free email academy with daily emails teaching exponential with 1000+ tutorials on AI, data science, Python, freelancing, and Blockchain development!

Join the Finxter Academy and unlock access to premium courses 👑 to certify your skills in exponential technologies and programming.

New Finxter Tutorials:

Finxter Categories:

Источник

Get HTML from URL in Python

Webpages are made using HTML. It is the programming code that defines the webpage and its contents. It is at the core of every website on the internet.

We can access and retrieve content from web pages using Python. Python allows us to access different types of data from URLs like JSON, HTML, XML, and more. We can use different libraries for working with HTML in Python.

Get HTML from URL in Python

We will now discuss how to get HTML from URL in Python.

Using the urllib library to get HTML from URL in Python

The urllib library in Python is used to handle operations related to fetching and working with URLs and accessing different URLs. We can use different functionalities from this module to get HTML from URL in Python.

First, we need to access the URL. For this, we use the urllib.request class. We can use the urllib.request.urlopen() function to create a urllib.request class object that creates a connection to the desired URL. We specify the URL within the urlopen() function.

Then, to get HTML from URL in Python, we use the read() function with this object. In Python 3, this returns a bytes object. So, we need to convert this object to a string by decoding it.

We will use the decode() function to retrieve the HTML as strings and display it. One should also terminate the urllib.request object using the close() function.

We will now use this in the code below.

Источник

Оцените статью