- JavaScript HTML DOM Document
- The HTML DOM Document Object
- Finding HTML Elements
- Changing HTML Elements
- Adding and Deleting Elements
- Adding Events Handlers
- Finding HTML Objects
- JavaScript | Как получить весь текст на HTML-странице?
- Видео инструкция
- Решение вопроса
- Итог
- JavaScript – Getting All Text Nodes
- All text in document javascript
- # Get the Text of an HTML Element in JavaScript
- # Handling leading and trailing spaces when using textContent
- # Using textContent vs innerText
- # Additional Resources
JavaScript HTML DOM Document
The HTML DOM document object is the owner of all other objects in your web page.
The HTML DOM Document Object
The document object represents your web page.
If you want to access any element in an HTML page, you always start with accessing the document object.
Below are some examples of how you can use the document object to access and manipulate HTML.
Finding HTML Elements
Method | Description |
---|---|
document.getElementById(id) | Find an element by element id |
document.getElementsByTagName(name) | Find elements by tag name |
document.getElementsByClassName(name) | Find elements by class name |
Changing HTML Elements
Property | Description |
---|---|
element.innerHTML = new html content | Change the inner HTML of an element |
element.attribute = new value | Change the attribute value of an HTML element |
element.style.property = new style | Change the style of an HTML element |
Method | Description |
element.setAttribute(attribute, value) | Change the attribute value of an HTML element |
Adding and Deleting Elements
Method | Description |
---|---|
document.createElement(element) | Create an HTML element |
document.removeChild(element) | Remove an HTML element |
document.appendChild(element) | Add an HTML element |
document.replaceChild(new, old) | Replace an HTML element |
document.write(text) | Write into the HTML output stream |
Adding Events Handlers
Method | Description |
---|---|
document.getElementById(id).onclick = function()code> | Adding event handler code to an onclick event |
Finding HTML Objects
The first HTML DOM Level 1 (1998), defined 11 HTML objects, object collections, and properties. These are still valid in HTML5.
Later, in HTML DOM Level 3, more objects, collections, and properties were added.
Property | Description | DOM |
---|---|---|
document.anchors | Returns all elements that have a name attribute | 1 |
document.applets | Deprecated | 1 |
document.baseURI | Returns the absolute base URI of the document | 3 |
document.body | Returns the element | 1 |
document.cookie | Returns the document’s cookie | 1 |
document.doctype | Returns the document’s doctype | 3 |
document.documentElement | Returns the element | 3 |
document.documentMode | Returns the mode used by the browser | 3 |
document.documentURI | Returns the URI of the document | 3 |
document.domain | Returns the domain name of the document server | 1 |
document.domConfig | Obsolete. | 3 |
document.embeds | Returns all elements | 3 |
document.forms | Returns all elements | 1 |
document.head | Returns the element | 3 |
document.images | Returns all elements | 1 |
document.implementation | Returns the DOM implementation | 3 |
document.inputEncoding | Returns the document’s encoding (character set) | 3 |
document.lastModified | Returns the date and time the document was updated | 3 |
document.links | Returns all and elements that have a href attribute | 1 |
document.readyState | Returns the (loading) status of the document | 3 |
document.referrer | Returns the URI of the referrer (the linking document) | 1 |
document.scripts | Returns all elements | 3 |
document.strictErrorChecking | Returns if error checking is enforced | 3 |
document.title | Returns the element | 1 |
document.URL | Returns the complete URL of the document | 1 |
JavaScript | Как получить весь текст на HTML-странице?
Куда вводить эту команду? Открываете HTML-страницу, с которой хотите получить весь текст. Включаете «Инструменты разработчика» в браузере (CTRL + SHIFT + i). Находите вкладку «Console«. Тыкаете курсор в белое поле справа от синей стрелочки. Вставляете команду. Жмёте клавишу ENTER.
Для тех кто не понял длинную строчку кода выше, предлагаю упрощённую для понимания версию. Пошаговая инструкция и видео ниже.
Видео инструкция
В этом видео приводится пример получения всего текста на HTML-странице при помощи JavaScript и объектной модели документа. Ввод команд осуществляется в консоль браузера Google Chrome. Результат виден сразу.
Решение вопроса
Мы будем использовать объектную модель документа — DOM. Обращаемся к объекту document.
Скриншот страницы и вкладки Console:
Мы видим, что объект хранит в себе всю разметку.
Получим «элемент документа» — по сути получим всю разметку страницы (элемент html и его содержимое) + это будет не строка, а набор объектов элементов.
Эта команда отсекла только элемент DOCTYPE. Он нам всё равно не нужен. Эта информация важна только браузеру.
Теперь вся разметка представлена в виде объектов JavaScript. Каждый объект хранит в себе пары «ключ/значение» — с общими и уникальными ключами. Все объекты HTML-элементов имеют ключ innerText. Это значит, что мы сможем получить всё текстовое содержимое каждого парного элемента. Причём получим не только видимый на экране текст, но и скрытый текст (текст может быть скрытым из-за особенностей дизайна сайта и оформления вкладок внутри страницы).
document.documentElement.innerText
Итог
Мы получили весь текст со страницы. Теперь можно разбить весь этот текст на строки и положить в массив. Потом можно разбить полученные предложения на слова. Далее можно составить поисковый индекс и рейтинг слов на странице.
Поисковая фраза — «js all document innertext»
JavaScript – Getting All Text Nodes
I have been working on something I am calling JS-Proofs (or jPaq Proofs). While working on the menu I was faced with the annoying issue of either not having whitespaces between my elements in the markup or removing them some other way. Since my favorite language is JS, I decided to write a function that would remove all of the text nodes with whitespaces for me. This evolved into a more general function which retrieves an array of all the text nodes contained by a given element:
/** * Gets an array of the matching text nodes contained by the specified element. * @param elem * The DOM element which will be traversed. * @param opt_fnFilter * Optional function that if a true-ish value is returned will cause the * text node in question to be added to the array to be returned from * getTextNodesIn(). The first argument passed will be the text node in * question while the second will be the parent of the text node. * @return > * Array of the matching text nodes contained by the specified element. */ function getTextNodesIn(elem, opt_fnFilter) < var textNodes = []; if (elem) < for (var nodes = elem.childNodes, i = nodes.length; i--;) < var node = nodes[i], nodeType = node.nodeType; if (nodeType == 3) < if (!opt_fnFilter || opt_fnFilter(node, elem)) < textNodes.push(node); >> else if (nodeType == 1 || nodeType == 9 || nodeType == 11) < textNodes = textNodes.concat(getTextNodesIn(node, opt_fnFilter)); >> > return textNodes; >
What is kind of cool about the above function is that it not only allows you to get all of the child text nodes, but all descendant text nodes. This function also provides the capability of specifying an optional filtering function to just return text nodes that match a criteria. The function will be passed the text node in question and the parent of that text node. In order to specify that the text node should be added to the array returned from getTextNodesIn you must return a true-ish value.
The following is an example of how to use this function in order to remove all whitespace text nodes from an element:
var menu = document.getElementById('divMenu'); getTextNodesIn(menu, function(textNode, parent) < if (/^\s+$/.test(textNode.nodeValue)) < parent.removeChild(textNode); >>);
In the strictest sense I am actually kind of hacking the new function that I defined but it gets the job done. 😎
All text in document javascript
Last updated: Jan 12, 2023
Reading time · 3 min
# Get the Text of an HTML Element in JavaScript
Use the textContent property to get the text of an HTML element, e.g. const result = element.textContent .
The textContent property will return the text content of the element and its descendants. If the element is empty, an empty string is returned.
Here is the HTML for the example.
Copied!DOCTYPE html> html lang="en"> head> title>bobbyhadz.comtitle> meta charset="UTF-8" /> head> body> div id="container"> One, span style="background-color: salmon">Twospan>, Three div> script src="index.js"> script> body> html>
And here is the related JavaScript code.
Copied!const container = document.getElementById('container'); // 👇️ One, Two, Three console.log(container.textContent); // 👇️ One, Two, Three console.log(container.innerText);
We used the textContent property to get the text content of the div and its descendants.
If the div element were empty, the property would return an empty string.
# Handling leading and trailing spaces when using textContent
You might get leading or trailing spaces when using textContent depending on the structure of your HTML.
If you need to remove any leading or trailing spaces, use the trim() method.
Copied!const container = document.getElementById('container'); // 👇️ "One, Two, Three" const result = container.textContent.trim();
The String.trim() method removes the leading and trailing whitespace from a string and returns a new string, without modifying the original string.
The trim() method removes all whitespace characters including spaces, tabs and newlines.
# Using textContent vs innerText
The code snippet also showed that we can use the innerText property to get the text content of an element and its descendants.
Copied!const container = document.getElementById('container'); // 👇️ One, Two, Three const result = container.innerText;
However, there are some important differences between the textContent and innerText properties:
- textContent gets the content of all elements, including script and style elements, whereas innerText only gets the content of «human-readable» elements.
- innerText is aware of styling and does not return the text of hidden elements, whereas textContent does not take styles into consideration.
- using textContent can prevent cross-site scripting attacks.
innerText takes CSS styles into account, so when the property is accessed, a reflow is triggered to ensure the styles are up-to-date.
Reflows can be expensive and should be avoided when possible.
When you use textContent and innerText to set the element’s text content, the element’s child nodes get removed.
When using the textContent and innerText properties to update the text content of the element, the child nodes of the element get replaced with a single text node with the provided string value.
If you need to set an element’s text content, you should use the insertAdjacentText method instead.
Copied!const container = document.getElementById('container'); // ✅ Update the text content of the element container.insertAdjacentText('beforeend', ', Four'); // ✅ Update the HTML content of the element container.insertAdjacentHTML( 'beforeend', ', Five', );
The insertAdjacentText method doesn’t remove the child nodes of the element it was called on.
The insertAdjacentText method takes the following 2 parameters:
- position — the position relative to the element where the text should be inserted. Can be one of the following 4:
- beforebegin — before the element itself.
- afterbegin — just inside the element, before its first child.
- beforeend — just inside the element, after its last child.
- afterend — after the element itself.
- data — the string from which to create a new text node to insert at the given position.
In the example, we added a string inside of the element, after its last child. However, you can pass a different first argument to the method depending on your use case.
The example also shows how to use the insertAdjacentHTML method to insert HTML into the div element.
The insertAdjacentHTML method takes the same first parameter as insertAdjacentText .
Copied!const container = document.getElementById('container'); // ✅ Update HTML content of element container.insertAdjacentHTML( 'beforeend', ', Five', );
However, note that you shouldn’t use user-generated input without escaping it, because that leads to a cross-site scripting vulnerability.
# Additional Resources
You can learn more about the related topics by checking out the following tutorials:
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.