Php read by tag

W-Shadow.com

How To Extract HTML Tags And Their Attributes With PHP

There are several ways to extract specific tags from an HTML document. The one that most people will think of first is probably regular expressions. However, this is not always – or, as some would insist, ever – the best approach. Regular expressions can be handy for small hacks, but using a real HTML parser will usually lead to simpler and more robust code. Complex queries, like “find all rows with the class .foo of the second table of this document and return all links contained in those rows”, can also be done much easier with a decent parser.

There are some (though very few they may be) edge case where regular expressions might work better, so I will discuss both approaches in this post.

Extracting Tags With DOM

PHP 5 comes with a usable DOM API built-in that you can use to parse and manipulate (X)HTML documents. For example, here’s how you could use it to extract all link URLs from a HTML file :

//Load the HTML page $html = file_get_contents('page.htm'); //Create a new DOM document $dom = new DOMDocument; //Parse the HTML. The @ is used to suppress any parsing errors //that will be thrown if the $html string isn't valid XHTML. @$dom->loadHTML($html); //Get all links. You could also use any other tag name here, //like 'img' or 'table', to extract other tags. $links = $dom->getElementsByTagName('a'); //Iterate over the extracted links and display their URLs foreach ($links as $link)< //Extract and show the "href" attribute. echo $link->getAttribute('href'), '
'; >

In addition to getElementsByTagName() you can also use $dom->getElementById() to find tags with a specific id. For more complex tasks, like extracting deeply nested tags, XPath is probably the way to go. For example, to find all list items with the class “foo” containing links with the class “bar” and display the link URLs :

//Load the HTML page $html = file_get_contents('page.htm'); //Parse it. Here we use loadHTML as a static method //to parse the HTML and create the DOM object in one go. @$dom = DOMDocument::loadHTML($html); //Init the XPath object $xpath = new DOMXpath($dom); //Query the DOM $links = $xpath->query( '//li[contains(@class, "foo")]//a[@class = "bar"]' ); //Display the results as in the previous example foreach($links as $link)< echo $link->getAttribute('href'), '
'; >

For more information about DOM and XPath see these resources :

Читайте также:  Вывести адрес переменной python

Honourable mention : Simple HTML DOM Parser is a popular alternative HTML parser for PHP 5 that lets you manipulate HTML pages with jQuery-like ease. However, I personally wouldn’t recommend using it if you care about your script’s performance, as in my tests Simple HTML DOM was about 30 times slower than DOMDocument.

Extracting Tags & Attributes With Regular Expressions

There are only two advantages to processing HTML with regular expressions – availability and edge-case performance. While most parsers require PHP 5 or later, regular expressions are available pretty much anywhere. Also, they are a little bit faster than real parsers when you need to extract something from a very large document (on the order of 400 KB or more). Still, in most cases you’re better off using the PHP DOM extension or even Simple HTML DOM, not messing with convoluted regexps.

That said, here’s a PHP function that can extract any HTML tags and their attributes from a given string :

/** * extract_tags() * Extract specific HTML tags and their attributes from a string. * * You can either specify one tag, an array of tag names, or a regular expression that matches the tag name(s). * If multiple tags are specified you must also set the $selfclosing parameter and it must be the same for * all specified tags (so you can't extract both normal and self-closing tags in one go). * * The function returns a numerically indexed array of extracted tags. Each entry is an associative array * with these keys : * tag_name - the name of the extracted tag, e.g. "a" or "img". * offset - the numberic offset of the first character of the tag within the HTML source. * contents - the inner HTML of the tag. This is always empty for self-closing tags. * attributes - a name -> value array of the tag's attributes, or an empty array if the tag has none. * full_tag - the entire matched tag, e.g. 'example.com'. This key * will only be present if you set $return_the_entire_tag to true. * * @param string $html The HTML code to search for tags. * @param string|array $tag The tag(s) to extract. * @param bool $selfclosing Whether the tag is self-closing or not. Setting it to null will force the script to try and make an educated guess. * @param bool $return_the_entire_tag Return the entire matched tag in 'full_tag' key of the results array. * @param string $charset The character set of the HTML code. Defaults to ISO-8859-1. * * @return array An array of extracted tags, or an empty array if no matching tags were found. */ function extract_tags( $html, $tag, $selfclosing = null, $return_the_entire_tag = false, $charset = 'ISO-8859-1' ) < if ( is_array($tag) )< $tag = implode('|', $tag); >//If the user didn't specify if $tag is a self-closing tag we try to auto-detect it //by checking against a list of known self-closing tags. $selfclosing_tags = array( 'area', 'base', 'basefont', 'br', 'hr', 'input', 'img', 'link', 'meta', 'col', 'param' ); if ( is_null($selfclosing) ) < $selfclosing = in_array( $tag, $selfclosing_tags ); >//The regexp is different for normal and self-closing tags because I can't figure out //how to make a sufficiently robust unified one. if ( $selfclosing )< $tag_pattern = '@<(?P'.$tag.') # \s[^>]+)? # attributes, if any \s*/?> # /> or just >, being lenient here @xsi'; > else < $tag_pattern = '@<(?P'.$tag.') # \s[^>]+)? # attributes, if any \s*> # > (?P.*?) # tag contents # the closing @xsi'; > $attribute_pattern = '@ (?P\w+) # attribute name \s*=\s* ( (?P[\"\'])(?P.*?)(?P=quote) # a quoted value | # or (?P[^\s"\']+?)(?:\s+|$) # an unquoted value (terminated by whitespace or EOF) ) @xsi'; //Find all tags if ( !preg_match_all($tag_pattern, $html, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE ) ) < //Return an empty array if we didn't find anything return array(); >$tags = array(); foreach ($matches as $match)< //Parse tag attributes, if any $attributes = array(); if ( !empty($match['attributes'][0]) )< if ( preg_match_all( $attribute_pattern, $match['attributes'][0], $attribute_data, PREG_SET_ORDER ) )< //Turn the attribute data into a name->value array foreach($attribute_data as $attr) < if( !empty($attr['value_quoted']) )< $value = $attr['value_quoted']; >else if( !empty($attr['value_unquoted']) ) < $value = $attr['value_unquoted']; >else < $value = ''; >//Passing the value through html_entity_decode is handy when you want //to extract link URLs or something like that. You might want to remove //or modify this call if it doesn't fit your situation. $value = html_entity_decode( $value, ENT_QUOTES, $charset ); $attributes[$attr['name']] = $value; > > > $tag = array( 'tag_name' => $match['tag'][0], 'offset' => $match[0][1], 'contents' => !empty($match['contents'])?$match['contents'][0]:'', //empty for self-closing tags 'attributes' => $attributes, ); if ( $return_the_entire_tag ) < $tag['full_tag'] = $match[0][0]; >$tags[] = $tag; > return $tags; >

Usage examples

Extract all links and output their URLs :

$html = file_get_contents( 'example.html' ); $nodes = extract_tags( $html, 'a' ); foreach($nodes as $link)< echo $link['attributes']['href'] , '
'; >

Extract all heading tags and output their text :

$nodes = extract_tags( $html, 'h\d+', false ); foreach($nodes as $node)< echo strip_tags($link['contents']) , '
'; >
$nodes = extract_tags( $html, 'meta' );

Extract bold and italicised text fragments :

$nodes = extract_tags( $html, array('b', 'strong', 'em', 'i') ); foreach($nodes as $node)< echo strip_tags( $node['contents'] ), '
'; >

The function is pretty well documented, so check the source if anything is unclear. Of course, you can also leave a comment if you have any further questions or feedback.

Related posts :

This entry was posted on Tuesday, October 20th, 2009 at 17:35 and is filed under Web Development. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A MP3 ID3 tags reader in native PHP

License

shubhamjain/PHP-ID3

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

PHP-ID3 makes use of native PHP to read ID3 Tags and thumbnail from a MP3 file. There have been many revisions to ID3 Tags specification; this program makes use of v3.2 of the spec.

To read binary data more effectively, I have created a sclass, BinaryFileReader, which reads data in named chunks.

< "require" : < "shubhamjain/php-id3": "dev-master" > >

You will first need to include the autoload.php generated by composer and then you can use the classes in PhpId3 namespace.

 require 'vendor/autoload.php'; //. use PhpId3\Id3TagsReader; //. $id3 = new Id3TagsReader(fopen("Exodus - 06 - Piranha.mp3", "rb")); $id3->readAllTags(); //Calling this is necesarry before others foreach($id3->getId3Array() as $key => $value) < if( $key != pl-s">APIC" ) < //Skip Image data echo $value["FullTagName"] . ": " . $value["Body"] . "
"; > > list($mimeType, $image) = $id3->getImage(); file_put_contents("thumb.jpeg", $image ); //Note the image type depends upon MimeType //.

See LICENSE for more informations

If you used this project or liked it or have any doubt about the source, send your valuable thoughts at shubham.jain.1@gmail.com.

Источник

How to Get HTML Tag Value in PHP?

In this example, i will show you php get html tag value example. We will look at example of php get html element value by id. In this article, we will implement a php get html element value. This post will give you simple example of how to get html tag value in php.

Here, i will give you simple two examples how to get html tag value in php and another how to get value by id in php. so let’s see both code and output as bellow:

PHP Get HTML Element Value

$htmlEle = «

«;

$domdoc = new DOMDocument();

$domdoc->loadHTML($htmlEle);

$pTagValue = $domdoc->getElementById(‘paragraphID’)->nodeValue;

echo $pTagValue;

This is ItSolutionStuff.com Example

PHP Get HTML Element Value By ID

$htmlEle = «

This is ItSolutionStuff.com Example 1

This is ItSolutionStuff.com Example 2

«;

$domdoc = new DOMDocument();

$domdoc->loadHTML($htmlEle);

$pTags = $domdoc->getElementsByTagName(‘p’);

foreach ($pTags as $p) echo $p->nodeValue, PHP_EOL;

>

This is ItSolutionStuff.com Example 1

This is ItSolutionStuff.com Example 2

Hardik Savani

I’m a full-stack developer, entrepreneur and owner of Aatman Infotech. I live in India and I love to write tutorials and tips that can help to other artisan. I am a big fan of PHP, Laravel, Angular, Vue, Node, Javascript, JQuery, Codeigniter and Bootstrap from the early stage. I believe in Hardworking and Consistency.

We are Recommending you

  • PHP Generate QR Code using Google Chart API Example
  • PHP Explode Every Character into Array Example
  • PHP Remove Element from Array by Value Example
  • How to Remove Last Element from Array in PHP?
  • How to Remove First Element from Array in PHP?
  • PHP Curl Delete Request Example Code
  • How to Generate 4,6,8,10 Digit Random number in PHP?
  • PHP — How to replace image src in a dynamic HTML string
  • PHP Remove Directory and Files in Directory Example
  • MySQL Query to Get Current Year Data Example
  • How to Remove Null Values from Array in PHP?

Источник

Оцените статью