Simple html dom children

Содержание

simple_html_dom
Public Properties
Protected Properties
Parsing documents
DOM methods & properties
Element methods & properties
DOM traversing
Camel naming conventions
Parsing documents
DOM methods & properties
Element methods & properties
DOM traversing
Camel naming conventions
Simple HTML DOM PHP примеры
Примеры Simple HTML DOM
Парсинг от А до Я

simple_html_dom

Represents the DOM in memory. Provides functions to parse documents and access individual elements (see simple_html_dom_node ).

Public Properties

Property	Description
root	Root node of the document.
nodes	List of top-level nodes in the document.
callback	Callback function that is called for each element in the DOM when generating outertext.
lowercase	If enabled, all tag names are converted to lowercase when parsing documents.
original_size	Original document size in bytes.
size	Current document size in bytes.
_charset	Charset of the original document.
_target_charset	Target charset for the current document.
default_span_text	Text to return for elements.

Protected Properties

Property	Description
pos	Current parsing position within doc .
doc	The original document.
char	Character at position pos in doc .
cursor	Current element cursor in the document.
parent	Parent element node.
noise	Noise from the original document (i.e. scripts, comments, etc. ).
token_blank	Tokens that are considered whitespace in HTML.
token_equal	Tokens to identify the equal sign for attributes, stopping either at the closing tag («/» i.e. ) or the end of an opening tag («>» i.e. ).
token_slash	Tokens to identify the end of a tag name. A tag name either ends on the ending slash («/» i.e. ) or whitespace ( «\s\r\n\t» ).
token_attr	Tokens to identify the end of an attribute.
default_br_text	Text to return for elements.
self_closing_tags	A list of tag names where the closing tag is omitted.
block_tags	A list of tag names where remaining unclosed tags are forcibly closed.
optional_closing_tags	A list of tag names where the closing tag can be omitted.

Источник

Parsing documents

The parser accepts documents in the form of URLs, files and strings. The document must be accessible for reading and cannot exceed MAX_FILE_SIZE .

Читайте также: Страница моя семья html

Name	Description
str_get_html( string $content ) : object	Creates a DOM object from string.
file_get_html( string $filename ) : object	Creates a DOM object from file or URL.

DOM methods & properties

Name	Description
__construct( [string $filename] ) : void	Constructor, set the filename parameter will automatically load the contents, either text or file/url.
plaintext : string	Returns the contents extracted from HTML.
clear() : void	Clean up memory.
load( string $content ) : void	Load contents from string.
save( [string $filename] ) : string	Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
load_file( string $filename ) : void	Load contents from a file or a URL.
set_callback( string $function_name ) : void	Set a callback function.
find( string $selector [, int $index] ) : mixed	Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.

Element methods & properties

Name	Description
[attribute] : string	Read or write element’s attribute value.
tag : string	Read or write the tag name of element.
outertext : string	Read or write the outer HTML text of element.
innertext : string	Read or write the inner HTML text of element.
plaintext : string	Read or write the plain text of element.
find( string $selector [, int $index] ) : mixed	Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.

DOM traversing

Name	Description
$e->children( [int $index] ) : mixed	Returns the Nth child object if index is set, otherwise return an array of children.
$e->parent() : element	Returns the parent of element.
$e->first_child() : element	Returns the first child of element, or null if not found.
$e->last_child() : element	Returns the last child of element, or null if not found.
$e->next_sibling() : element	Returns the next sibling of element, or null if not found.
$e->prev_sibling() : element	Returns the previous sibling of element, or null if not found.

Camel naming conventions

Method	Mapping
$e->getAllAttributes()	$e->attr
$e->getAttribute( $name )	$e->attribute
$e->setAttribute( $name, $value)	$value = $e->attribute
$e->hasAttribute( $name )	isset($e->attribute)
$e->removeAttribute ( $name )	$e->attribute = null
$e->getElementById ( $id )	$e->find ( «#$id», 0 )
$e->getElementsById ( $id [,$index] )	$e->find ( «#$id» [, int $index] )
$e->getElementByTagName ($name )	$e->find ( $name, 0 )
$e->getElementsByTagName ( $name [, $index] )	$e->find ( $name [, int $index] )
$e->parentNode ()	$e->parent ()
$e->childNodes ( [$index] )	$e->children ( [int $index] )
$e->firstChild ()	$e->first_child ()
$e->lastChild ()	$e->last_child ()
$e->nextSibling ()	$e->next_sibling ()
$e->previousSibling ()	$e->prev_sibling ()

Источник

Parsing documents

The parser accepts documents in the form of URLs, files and strings. The document must be accessible for reading and cannot exceed MAX_FILE_SIZE .

Name	Description
str_get_html( string $content ) : object	Creates a DOM object from string.
file_get_html( string $filename ) : object	Creates a DOM object from file or URL.

DOM methods & properties

Name	Description
__construct( [string $filename] ) : void	Constructor, set the filename parameter will automatically load the contents, either text or file/url.
plaintext : string	Returns the contents extracted from HTML.
clear() : void	Clean up memory.
load( string $content ) : void	Load contents from string.
save( [string $filename] ) : string	Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
load_file( string $filename ) : void	Load contents from a file or a URL.
set_callback( string $function_name ) : void	Set a callback function.
find( string $selector [, int $index] ) : mixed	Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.

Element methods & properties

Name	Description
[attribute] : string	Read or write element’s attribute value.
tag : string	Read or write the tag name of element.
outertext : string	Read or write the outer HTML text of element.
innertext : string	Read or write the inner HTML text of element.
plaintext : string	Read or write the plain text of element.
find( string $selector [, int $index] ) : mixed	Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.

DOM traversing

Name	Description
$e->children( [int $index] ) : mixed	Returns the Nth child object if index is set, otherwise return an array of children.
$e->parent() : element	Returns the parent of element.
$e->first_child() : element	Returns the first child of element, or null if not found.
$e->last_child() : element	Returns the last child of element, or null if not found.
$e->next_sibling() : element	Returns the next sibling of element, or null if not found.
$e->prev_sibling() : element	Returns the previous sibling of element, or null if not found.

Camel naming conventions

Method	Mapping
$e->getAllAttributes()	$e->attr
$e->getAttribute( $name )	$e->attribute
$e->setAttribute( $name, $value)	$value = $e->attribute
$e->hasAttribute( $name )	isset($e->attribute)
$e->removeAttribute ( $name )	$e->attribute = null
$e->getElementById ( $id )	$e->find ( «#$id», 0 )
$e->getElementsById ( $id [,$index] )	$e->find ( «#$id» [, int $index] )
$e->getElementByTagName ($name )	$e->find ( $name, 0 )
$e->getElementsByTagName ( $name [, $index] )	$e->find ( $name [, int $index] )
$e->parentNode ()	$e->parent ()
$e->childNodes ( [$index] )	$e->children ( [int $index] )
$e->firstChild ()	$e->first_child ()
$e->lastChild ()	$e->last_child ()
$e->nextSibling ()	$e->next_sibling ()
$e->previousSibling ()	$e->prev_sibling ()

Источник

Simple HTML DOM PHP примеры

PHP Simple HTML DOM — это php библиотека с помощью которой можно написать парсер html страниц(ы). После загрузки страницы библиотека создает объект со всеми элементами, которые находились на странице и к которым теперь можно получить быстрый доступ с помощью встроенных функций.

Примеры Simple HTML DOM

Инициализация объекта (загрузка html-страницы)

require 'simple_html_dom.php'; //подключаем библиотеку $html = file_get_html( 'http://www.example.com/' ); // получаем страницу //во время работы из командной строки в windows (без http-сервера) лучше использовать следующую конструкцию: $load = file_get_contents( $link ); $html= str_get_html( $load ); // и дальше работаем с переменной $html

Найдем нужный элемент

$element = $html->find( '.myclass' );

Функция find() — создаст нам массив со всеми доступными элементами с классом .myclass. Для того чтобы их все обработать придется сначала пропустить переменную через цикл для массива:

Или как вариант — сразу выбрать нужный элемент указав индекс (как в массиве, начинается с нуля):

$element = $html->find( '.myclass', 0 );

Получим дочерний элемент:

Можно бродить по дереву вложенности, полезно если нужно обрабатывать страницу последовательно:

$element->children(0)->class; // класс первого дочернего элемента $element->children(0)->children(1)->id //получил id вложенного элемента $element->children(0)->outertext; //html-код элемента $element->children(0)->innertext; //html-код внутри элемента $eleemnt->children(0)->plaintext; //Текст внутри (очищенный от html) $eleemnt->children(0)->tag; //html-тэг

Для обработки множества вложенных тегов последовательно можно пользоваться такими функциями:

$element->children($i)->tag; $element->children(0)->children($i)->tag; //и т.д.

Дополнительно можно добавить собственные функции обработки:

// запишем в функцию параметр "$element" function my_callback($element) < // скрыть все теги if ($element->tag=='b') $element->outertext = ''; > // Регистрируем обратный вызов с именем нашей функции $html->set_callback('my_callback'); // Функция будет вызвана во время вывода echo $html;

Дополнительные примеры можно найти в официальной документации.

Источник

Парсинг от А до Я

Сегодня статья опять будет про PHP Simple Html DOM Parser . Даже несмотря на то, что некоторым читателям эта тема могла хорошенько поднадоесть. 🙂 Просто хочется собрать на блоге достаточное количество материала, к которому можно было бы отсылать вопрошающих по емэйлу.

Итак, навигация по DOM-дереву . Прямо здесь. Прямо сейчас. На примерах. (Так как теоретически она и так описана в инструкции к библиотеке).

Если вы читаете эту статью, то вам уже известно, что такое DOM-структура, древовидное представление данных, узлы дерева, родитель, потомок и т.д.. Структуру html-документа в виде дерева можно наглядно посмотреть в Firebug-е.

Там же есть закладка DOM, с содержимым которой советую ознакомиться новичкам. Из структуры, которая там раскрывается, вы наглядно увидите результаты обращения к дочерним элементам, отдельным узлам, свойствам, атрибутам и т.д..

Но вернемся к PHP Simple Html DOM Parser.

Для примера возьмем простенький html-код с таблицей из трех столбцов и трех строк.



















  1.1  1.2  1.3  
  2.1  2.2  2.3  
  3.1  3.2  3.3

Для тех, кто не любит читать на английском, привожу перевод описания функций:

mixed $e->children ( [int $index] ) - возвращает N-ого потомка, если $index указан, или массив всех потомков, если индекс не указан.
element $e->parent () - возвращает родителя элемента.
element $e->first_child () - возвращает первого потомка элемента или null, если потомков нет.
element $e->last_child () - возвращает последнего потомка элемента или null, если потомков нет.
element $e->next_sibling () - возвращает следующего потомка элемента или null, если таковой не найден.
element $e->prev_sibling () - возвращает предыдущего потомка элемента или null, если таковой не найден.

children(N) эквивалентно childNodes(N).

Следующий код пройдет позволит обойти все ряды и столбцы таблицы:

$html = str_get_html($html_str);
foreach ($html->find("table", 0)->children() as $tr) foreach ($tr->children() as $td) echo $td->innertext.'; '; 
> 
echo '
';
>

Если в переменной $html_str приведенный выше код таблицы, то результат будет:

Тут, думаю, все понятно, проблем возникнуть не должно. Просто проходимся по массивам.
Следующий способ навигации — с помощью next_sibling. next_sibling и prev_sibling используют для навигации по элементам, находящимся на одном уровне (т.е. имеющих общего родителя). Для примера пройдемся по всем ячейкам первой строки таблицы. Код этой нехитрой операции будет выглядеть так:

$element = $html->find('table tr td',0);
while($element) echo $element->innertext.'; '; 
$element = $element->next_sibling();
>

Вот, в принципе, и все. Постаралась коротко и ясно, без всякой воды.
Удачных разработок!
___

Чтобы быть в курсе обновлений блога, можно подписаться на RSS.

Источник