- Преобразование HTML в текст для диапазона ячеек
- How to Convert HTML to Text in Cells in Excel
- A Real Example of Converting HTML to Text in Cells in Excel
- How to Convert HTML to Text in Cells in Excel Using Find and Replace
- How to Convert HTML to Text in Excel Using VBA
- Get emails from us about Google Sheets.
- VBA Excel. Парсинг сайтов, html-страниц и файлов
- Парсинг сайтов (WinHttp.WinHttpRequest.5.1)
- Парсинг файлов (ADODB.Stream)
- Примеры записи текста в переменную
- Извлечение данных из html
- Парсинг содержимого тегов
- Парсинг содержимого Id
- 6 комментариев для “VBA Excel. Парсинг сайтов, html-страниц и файлов”
- Convert html to plain text in VBA
- Solution 3
- Solution 4
- Solution 5
Преобразование HTML в текст для диапазона ячеек
Особенность кода: теги перевода строки «br /» сохраняются.
Пример использования процедуры Convert_HTML_Range_To_Text:
(для этого примера, преобразования выполняются для диапазона ячеек в столбце С, начиная с 4 строки)
Sub Макрос1() ' получаем ссылку на диапазон ячеек Dim ra As Range: Set ra = Range(Range("c4"), Range("c" & Rows.Count).End(xlUp)) ' преобразуем HTML в текст Convert_HTML_Range_To_Text ra End Sub
Код процедуры Convert_HTML_Range_To_Text:
Sub Convert_HTML_Range_To_Text(ByRef ra As Range) On Error Resume Next Set doc = CreateObject("htmlFile") Dim cell As Range For Each cell In ra.Cells HTML$ = cell.Value ' теги
мы оставляем, от остальных - избавляемся HTML$ = Replace(HTML$, "
", "«br»") HTML$ = Replace(HTML$, "
", "«br»") HTML$ = Replace(HTML$, "
", "«br»") doc.Body.innerHTML = HTML$ txt$ = doc.Body.innerText ' восстанавливаем теги
txt$ = Replace(txt$, "«br»", "
") If txt$ <> HTML$ Then cell = txt$ Next cell Set doc = Nothing End Sub
How to Convert HTML to Text in Cells in Excel
In other words, there are two ways we can use to convert html to text in cells in Excel, which are using the Find and Replace feature and VBA.
Table of Contents
Hypertext markup language, or html, is a standardized formatting system that is used to create web pages. So it is a computer language that is often used for web development, internet navigation, and web documentation.
An html code always contains tags, < and >, which can make it difficult to read, especially when placed in a cell in Excel. So there are two ways to convert html to text in Excel.
One is using the Find and Replace feature. The Find and Replace feature in Excel finds the character you specify and replaces it with any text, character, or number you input.
Another way is using VBA in Excel. VBA stands for Visual Basic for Applications. It is a programming language in Excel and other Office apps. And VBA automates repetitive tasks, data processing, and generating graphs and reports.
VBA is useful for converting html to text in the entire worksheet, all while using a VBA code. If we have certain repetitive tasks in Excel, we can utilize VBA and record a macro to automate those repetitive tasks.
But VBA is not available for the web-based version of Excel. We recommend using the VBA method when working with the Microsoft Excel application or the Office 360 version. So only the free version of Excel does not support working with VBA.
Suppose you are a web designer and you want to share how you created a certain web page with your colleagues. But some of them find it difficult to read html. So you need to convert the html to text in the spreadsheet to make it easier to read and share with others.
Awesome! Let’s move on and check out how to convert html to text in Excel using the two methods.
A Real Example of Converting HTML to Text in Cells in Excel
First, let’s focus on an example of how to convert html to text in cells using the Find and Replace feature. For instance, you have an html code in a cell.
Since html codes always have tags, and >, we simply need to find those tags in the cells. Then, we can replace it with nothing or an empty string. So this method will simply remove the tags in the html, and we will be left with just the text.
And this is what it will look like after removing the tags. So we have converted html to text.
Then, let’s see an example of converting html to text using VBA. Essentially, it does the same thing as the Find and Replace feature. But it utilizes a programming code instead.
VBA allows us to convert html to text in the entire worksheet. This is what the VBA window in Excel looks like.
We will convert html to text by copying a macro code in the module window. After all the tags, < and >, are removed from the entire worksheet, we will only be left with the text.
You can make your own copy of the spreadsheet above using the link attached below.
How to Convert HTML to Text in Cells in Excel Using Find and Replace
This section will explain the step-by-step process of how to convert HTML to Text in cells in Excel using the Find and Replace feature.
1. First, select the cell containing the html you want to convert to text. In this case, we will select A2 . Then, press Ctrl + H to open the Find and Replace window.
2. Next, input ‘ ’ in the Find what . Also, the * is a wildcard character that basically tells Excel to look for anything that starts and ends with tags.
3. Lastly, we will leave the Replace with empty or blank.
4. Since we only selected one cell, simply click Replace. Otherwise, it will apply the function to the whole worksheet.
Furthermore, you can click the Replace All option if you have more than one cell selected.
5. And that’s it! You have successfully converted html to text using Find and Replace.
6. Additionally, we may end up with a weird format after doing this. For example, the cell may go all the way down in the worksheet. To fix it, simply select the cell. Then, go to Home and select Wrap Text .
How to Convert HTML to Text in Excel Using VBA
This section will focus on the steps in converting html to text in Excel using VBA.
1. First, we need to open the VBA window in Excel. To do this, press Alt + F11 .
2. In the VBA window, select Insert .
3. Third, click Module . Then, input this code:
For Each Cell In Selection
4. Next, select the cells containing the html code you want to convert to text.
5. Finally, click Run or press the F5 key to run the macro code.
6. And tada! You have converted the html to text in Cells in Excel using a macro code in VBA.
That’s it! You have successfully learned how to convert html to text in cells in Excel using two methods: the Find and Replace feature and VBA. Now you can convert html to text whether you are using a free version of Excel or the application.
Are you interested in learning more about what Excel can do? You can now use the various other Microsoft Excel formulas available to create great worksheets that work for you. Make sure to subscribe to our newsletter to be the first to know about the latest guides and tutorials from us.
Get emails from us about Google Sheets.
Our goal this year is to create lots of rich, bite-sized tutorials for Google Sheets users like you. If you liked this one, you’ll love what we are working on! Readers receive ✨ early access ✨ to new content.
VBA Excel. Парсинг сайтов, html-страниц и файлов
Пользовательская функция GetHTML1 (VBA Excel) для извлечения (парсинга) текстового содержимого из html-страницы сайта по ее URL-адресу с помощью объекта «msxml2.xmlhttp»:
Парсинг сайтов (WinHttp.WinHttpRequest.5.1)
Парсинг файлов (ADODB.Stream)
Примеры записи текста в переменную
Извлечение данных из html
В понятие «парсинг», кроме извлечения текстового содержимого сайтов, html-страниц или файлов, входит поиск и извлечение конкретных данных из всего полученного текстового содержимого. Пример извлечения email-адресов из текста, присвоенного переменной, смотрите в последнем параграфе статьи: Регулярные выражения (объекты, свойства, методы).
Парсинг содержимого тегов
Парсинг содержимого Id
Для реализации представленных здесь примеров могут понадобиться дополнительные библиотеки. В настоящее время у меня подключены следующие (к данной теме могут относиться последние шесть):
- Visual Basic For Applications
- Microsoft Excel 16.0 Object Library
- OLE Automation
- Microsoft Office 16.0 Object Library
- Microsoft Forms 2.0 Object Library
- Ref Edit Control
- Microsoft Scripting Runtime
- Microsoft Word 16.0 Object Library
- Microsoft Windows Common Controls 6.0 (SP6)
- Microsoft ActiveX Data Objects 6.1 Library
- Microsoft ActiveX Data Objects Recordset 6.0 Library
- Microsoft HTML Object Library
- Microsoft Internet Controls
- Microsoft Shell Controls And Automation
- Microsoft XML, v6.0
С этим набором библиотек все примеры работают. Тестирование проводилось в VBA Excel 2016.
6 комментариев для “VBA Excel. Парсинг сайтов, html-страниц и файлов”
Доброго дня, коллеги.
Задача следующая: в элементе ВэбБраузерКонтрол на странице поиска выполняю определенный запрос и визуально убедившись в получении нужного результата начинаю парсить содержимое. Вопрос как получить текстовое содержимое элемента ВэбБраузерКонтрол в момент когда там отображена нужная информация.
Заранее спасибо.
Доброго дня!
При попытке вычислить к примеру первое вхождение, ищет все равно следующее почему то. Sub Primer1()
Dim myHtml As String, myFile As Object, myTag As Object, myTxt As String
‘Извлекаем содержимое html-страницы в переменную myHtml с помощью функции GetHTML1
myHtml = GetHTML1(«https://bik-info.ru?040702802»)
‘Создаем объект HTMLFile
Set myFile = CreateObject(«HTMLFile»)
‘Записываем в myFile текст из myHtml
myFile.body.innerHTML = myHtml
‘Присваиваем переменной myTag коллекцию одноименных тегов, имя которого
‘указанно в качестве аргумента метода getElementsByTagName
Set myTag = myFile.getElementsByTagName(«strong»)
‘Выбираем, содержимое какого тега по порядку, начинающегося с 0, нужно извлечь
myTxt = myTag(0).innerText
Debug.Print myTxt
‘Большой текст может не уместиться в MsgBox, тогда для просмотра используйте окно Immediate
‘Debug.Print myTxt
End Sub Function GetHTML1(ByVal myURL As String) As String
On Error Resume Next
With CreateObject(«msxml2.xmlhttp»)
.Open «GET», myURL, False
.send
Do: DoEvents: Loop Until .readyState = 4
GetHTML1 = .responseText
End With
End Function
Здравствуйте, Сергей!
На HTML-странице, которую вы указали, тегом strong выделены только две одинаковые даты. Предположу, что отображается первая дата. Если strong заменить на b , отобразится знак $ , который идет первый с этим тегом.
Convert html to plain text in VBA
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String Dim text As String Dim accumulating As Boolean Dim n As Integer Dim c As String text = "" accumulating = True n = 1 Do While n " Then accumulating = True Else If accumulating Then text = text & c End If End If n = n + 1 Loop StripTags = text End Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.
Solution 3
Tim’s solution was great, worked liked a charm.
I´d like to contribute: Use this code to add the «Microsoft HTML Object Library» in runtime:
It worked on Windows XP and Windows 7.
Solution 4
Tim’s answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.
Function HtmlToText(sHTML) As String Dim oDoc As HTMLDocument If IsNull(sHTML) Then HtmlToText = "" Exit Function End-If Set oDoc = New HTMLDocument oDoc.body.innerHTML = sHTML HtmlToText = oDoc.body.innerText End Function
Solution 5
Yes! I managed to solve my problem as well. Thanks everybody/
In my case, I had this sort of input:
Lorem ipsum dolor sit amet.
Ut enim ad minim veniam.
Duis aute irure dolor in reprehenderit.
And I did not want the result to be all jammed together without breaklines.
So I first splitted my input for every
tag into an array ‘paragraphs’, then for each element I used Tim’s answer to get the text out of html (very sweet answer btw).
In addition I concatenated each cleaned ‘paragraph’ with this breakline character Crh(10) for VBA/Excel.
The final code is:
Public Function HtmlToText(ByVal sHTML As String) As String Dim oDoc As HTMLDocument Dim result As String Dim paragraphs() As String If IsNull(sHTML) Then HtmlToText = "" Exit Function End If result = "" paragraphs = Split(sHTML, "") For Each paragraph In paragraphs Set oDoc = New HTMLDocument oDoc.body.innerHTML = paragraph result = result & Chr(10) & Chr(10) & oDoc.body.innerText Next paragraph HtmlToText = result End Function