Vba excel парсинг html

Guide to Extracting Website Data by Using Excel VBA

Microsoft Excel is undoubtedly one of the most used software applications around the world and across various disciplines. It not only stores, organizes, and manipulates the data using different functions and formulas but also allows users to access web pages and extract data from them.

In this tutorial, we are going to focus on the last mentioned feature by demonstrating how to perform Excel web scraping using VBA. We will briefly go through the installation and preparation of the environment and then write a scraper using VBA macro to successfully fetch data from a web page into Excel. Let’s get started.

What is VBA web scraping?

VBA web scraping is a special scraping technique that allows for automatic data gathering from websites to Excel. The scraping itself becomes possible with the use of such external applications like Microsoft Edge browser.

What is VBA?

VBA stands for Visual Basic Application. It is a programming language of Microsoft Corporation. VBA extends the capabilities of Microsoft Office tools and allows users to develop advanced functions and complex automation. VBA can also be used to write macros to pull data from websites into Excel.

Читайте также:  Mongodb tutorial for java

Pros and cons of using VBA for scraping

Before we move on to the tutorial part it is essential to highlight some advantages and disadvantages of web scraping to Excel with VBA.

Pros

  • Ready to use – VBA is bundled with Microsoft Office which basically means that if you already have MS Office installed, you don’t have to worry about installing anything else. You can use VBA right away in all the Microsoft Office tools.
  • Reliable – Both Microsoft Excel & VBA are developed and maintained by Microsoft. Unlike other development environments, these tools can be upgraded together to the latest version without much hassle.
  • Out-of-the-box support for browser – VBA web scrapers can take advantage of Microsoft’s latest browser Microsoft Edge which makes scraping dynamic websites pretty convenient.
  • Complete automation – When running the VBA script, you don’t have to perform any additional tasks or interact with the browser. Everything will be taken care of by the VBA script including log-in, scrolling, button clicks, etc.

Cons

  • Only works in Windows – VBA scrapers are not cross-platform. They only work in a Windows environment. While MS Office does have support for Mac, it is way harder to write a working VBA scraper on it. The library supports are also limited, for example, you will be unable to use Microsoft Edge.
  • Tightly coupled with MS Office – VBA scrapers are highly dependent on MS Office tools. Third-party useful scraping tools are hard to integrate with it.
  • Steep learning curve — VBA programming language is less beginner-friendly and a bit harder than other modern programming languages, such as Python or Javascript.
Читайте также:  All side box shadow in css

Overall, if you are looking to develop a web scraper for the Windows operating system that automatically pulls data from a website, then VBA-based web scraping will be a good choice.

Tutorial

Before we begin, let us make sure we’ve installed all the prerequisites and set up our environment properly so that it will be easier to follow along.

Prerequisites

In this tutorial, we’ll be using Windows 10 and Microsoft Office 10. However, the steps will be the same or similar for other versions of Windows. You’ll only need a computer with Windows Operating System. In addition, it’s necessary to install Microsoft Office if you don’t have it already. Detailed installation instructions can be found in the Microsoft’s Official documentation.

Preparing the environment

Now, that you’ve installed MS Office, complete the steps below to set up the development environment:

Step 1 — Open Microsoft Excel

From the start menu or Cortana search, find Microsoft Excel and open the application. You will see a similar interface as below:

Step 2 — Go to Options to enable developer menu

By default, Excel doesn’t show the developer button in the top ribbon. To enable this, we’ll have to go to “Options” from the File menu.

Step 3 — Select Customize Ribbon

Once you click the “Options,” a dialog will pop up where, from the side menu, you’ll need to select “Customize Ribbon”. Click on the check box next to “developer.” Make sure it is ticked and then click on “OK.”

Step 4 — Open Visual Basic application dialog

Now, you’ll see a new developer button on the top ribbon, clicking on it will expand the developer menu. From the menu, select “Visual Basic.”

Step 5 — Insert a new Module

Once you click on “Visual Basic», it will open a new window as demonstrated below:

Click on “Insert” and select “Module” to insert a new module. It will open the module editor.

Step 6 — Add new references

From the top menu select Tools > References… which will open a new window like the one below. Make sure to scroll through the available list of references and find Microsoft HTML Client Library and Microsoft Internet Control. Click on the check box next to both of them to enable these references. Once done, click OK.

That’s it! Our development environment is all set. Let’s write our first Excel VBA scraper.

Step 7 — Automate Microsoft Edge to open a website

Now, it’s time to update our newly created module to open the following website: https://quotes.toscrape.com. In the module editor, insert the following code:

Sub scrape_quotes() Dim browser As InternetExplorer Dim page As HTMLDocument Set browser = New InternetExplorer browser.Visible = True browser.navigate ("https://quotes.toscrape.com") End Sub 

We are defining a subroutine named scrape_quotes(). This function will be executed when we run this script. Inside the subroutine, we are defining two objects “browser” and “page”.

The “browser” object will allow us to interact with Microsoft Edge. Next, we also set the browser as visible so that we can see it in action. The browser.navigate() function tells the VBA browser object to open the URL. The output will be similar to this:

Note: You might be wondering why we are writing “InternetExplorer” to interact with Microsoft Edge. VBA initially only supported Internet Explorer-based automation, but once Microsoft discontinued Internet Explorer, they deployed some updates so that VBA’s InternetExplorer module can run the Microsoft Edge browser in IEMode without any issues. The above code will also work in older Windows that have Internet Explorer still available instead of Edge.

Step 8 — Scrape data using VBA script and save it to Excel

The next step is to scrape the quotes and authors from the website. For simplicity, we’ll store it in the first sheet of the Excel spreadsheet and grab the top 5 quotes for now.

Let’s begin by defining two new objects – one for quotes and another for authors.

 Dim quotes As Object Dim authors As Object 

After navigating to the website, we’ll also add a little bit of pause so that the website loads properly by using Loop.

 Do While browser.Busy: Loop 

Next, grab the quotes and authors from the HTML document.

 Set page = browser.document Set quotes = page.getElementsByClassName("quote") Set authors = page.getElementsByClassName("author") 

Use a for loop to populate the excel rows with the extracted data by calling the Cells function and passing the row and column position:

 For num = 1 To 5 Cells(num, 1).Value = quotes.Item(num).innerText Cells(num, 2).Value = authors.Item(num).innerText Next num 

Finally, close the browser by calling the quit function. The below code will close the browser window.

Output

Now, if we run the script again, it’ll open Microsoft Edge, browse to the quotes.toscrape.com website, grab the top 5 quotes from the list, and save them to the current excel file’s first sheet.

Source Code

Below is an example of a full source code:

Sub scrape_quotes() Dim browser As InternetExplorer Dim page As HTMLDocument Dim quotes As Object Dim authors As Object Set browser = New InternetExplorer browser.Visible = True browser.navigate ("https://quotes.toscrape.com") Do While browser.Busy: Loop Set page = browser.document Set quotes = page.getElementsByClassName("quote") Set authors = page.getElementsByClassName("author") For num = 1 To 5 Cells(num, 1).Value = quotes.Item(num).innerText Cells(num, 2).Value = authors.Item(num).innerText Next num browser.Quit End Sub 

Conclusion

Excel web scraping with VBA is a great choice for Windows automation and web extraction. It’s reliable and ready to use which means you won’t have to worry about any unexpected issues or additional steps. For your convenience, you can also access this tutorial in our GitHub repository.

The biggest disadvantage of VBA web scraping that was highlighted in the article is the lack of cross-platform support. However, if you want to develop web scrapers that can be used on multiple operating systems such as Linux or Mac, Excel Web Query can also be an option. We also recommend exploring web scraping with Python – one of the most popular programming languages that is capable of developing complex network applications while maintaining its simplified syntax. Of course, if you’re searching for more advanced web scraping solutions, check our Web Scraper API, designed to collect real-time public data at scale from most websites. And if you already tried web scraping but faced IP blocks, our Residential Proxies can help you overcome this issue.

Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

Learn more about Yelyzaveta Nechytailo

All information on Oxylabs Blog is provided on an «as is» basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website’s terms of service or receive a scraping license.

Источник

Оцените статью