Selenium получить html код страницы python

Selenium get HTML source in Python

Do you want to get the HTML source code of a webpage with Python selenium? In this article you will learn how to do that.
Selenium is a Python module for browser automation. You can use it to grab HTML code, what webpages are made of: HyperText Markup Language (HTML).

What is HTML source? This is the code that is used to construct a web page. It is a markup language.

To get it, first you need to have selenium and the web driver install. You can let Python fire the web browser, open the web page URL and grab the HTML source.

Related course:

Install Selenium

To start, install the selenium module for Python.

For windows users, do this instead:

It’s recommended that you do that in a virtual environment using virtualenv.
If you use the PyCharm IDE, you can install the module from inside the IDE.

Make sure you have the web driver installed, or it will not work.

Selenium get HTML

You can retrieve the HTML source of an URL with the code shown below.
It first starts the web browser (Firefox), loads the page and then outputs the HTML code.

The code below starts the Firefox web rbowser, opens a webpage with the get() method and finally stores the webpage html with browser.page_source.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#_*_coding: utf-8_*_

from selenium import webdriver
import time

# start web browser
browser=webdriver.Firefox()

# get source code
browser.get(«https://en.wikipedia.org»)
html = browser.page_source
time.sleep(2)
print(html)

# close web browser
browser.close()

selenium get html

This is done in a few steps first importing selenium and the time module.

from selenium import webdriver
import time

It starts the web browser with a single line of code. In this example we use Firefox, but any of the supported browsers. will do (Chrome, Edge, PhantomJS).

# start web browser
browser=webdriver.Firefox()

The URL you want to get is opened, this just opens the link in the browser.

# get source code
browser.get(«https://en.wikipedia.org»)

Then you can use the attribute .page_source to get the HTML code.

html = browser.page_source
time.sleep(2)
print(html)

You can then optionally output the HTML source (or do something else with it).

Don’t forget to close the web browser.

# close web browser
browser.close()

Источник

How To Get Page Source In Selenium Using Python?

Join

This article is a part of our Content Hub. For more in-depth resources, check out our content hub on Selenium Python Tutorial.

Retrieving the page source of a website under scrutiny is a day-to-day task for most test automation engineers. Analysis of the page source helps eliminate bugs identified during regular website UI testing, functional testing, or security testing drills. In an extensively complex application testing process, automation test scripts can be written in a way that if errors are detected in the program, then it automatically.

  • saves that particular page’s source code.
  • notifies the person responsible for the URL of the page.
  • extracts the HTML source of a specific element or code-block and delegates it to responsible authorities if the error has occurred in one particular independent HTML WebElement or code block.

This is an easy way to trace, fix logical and syntactical errors in the front-end code. In this article, we first understand the terminologies involved and then explore how to get the page source in Selenium WebDriver using Python.

TABLE OF CONTENT

Cta Python

What Is An HTML Page Source?

In non-technical terminology, it’s a set of instructions for browsers to display info on the screen in an aesthetic fashion. Browsers interpret these instructions in their own ways to create browser screens for the client-side. These are usually written using HyperText Markup Language (HTML), Cascading Style Sheets (CSS) & Javascript.

This entire set of HTML instructions that make a web page is called page source or HTML source, or simply source code. Website source code is a collection of source code from individual web pages.

Here’s an example of a Source Code for a basic page with a title, form, image & a submit button.

What Is An HTML Web Element?

The easiest way to describe an HTML web element would be, “any HTML tag that constitutes the HTML page source code is a web Element.” It could be an HTML code block, an independent HTML tag like
, a media object on the web page – image, audio, video, a JS function or even a JSON object wrapped within tags.

In the above example – is an HTML web element, so is and the children of body tags are HTML web elements too i.e., , etc.

How To Get Page Source In Selenium WebDriver Using Python?

Selenium WebDriver is a robust automation testing tool and provides automation test engineers with a diverse set of ready-to-use APIs. And to make Selenium WebDriver get page source, Selenium Python bindings provide us with a driver function called page_source to get the HTML source of the currently active URL in the browser.

Alternatively, we can also use the “GET” function of Python’s request library to load the page source. Another way is to execute javascript using the driver function execute_script and make Selenium WebDriver get page source in Python. A not-recommended way of getting page source is using XPath in tandem with “view-source:” URL. Let’s explore examples for these four ways of how to get page source in Selenium WebDriver using Python –

We’ll be using a sample small web page hosted on GitHub for all four examples. This page was created to demonstrate drag and drop testing in Selenium Python using LambdaTest.

Get HTML Page source Using driver.page_source

We’ll fetch “pynishant.github.io” in the ChromeDriver and save its content to a file named “page_source.html.” This filename could be anything of your choice. Next, we read the file’s content and print it on the terminal before closing the driver.

Источник

Читайте также:  Restart php cli ubuntu
Оцените статью