and

Содержание

How to Get Title and Metadata (meta tags) from URL using PHP
Output
related keywords
Recent Posts
Fastest way to retrieve a in PHP
Get title of website via link
10 Answers 10
Get Webpage Title and Meta Description from URL in PHP
PHP Code to Get Webpage Title from URL:
PHP Code to Get Webpage Meta Description from URL:

How to Get Title and Metadata (meta tags) from URL using PHP

In this article, I am going to show you how to get webpage Titles and Meta tags from external website URLs using PHP. Scroll down to the page so you can see the full source code available for this.

Mostly, All external websites are used to specify the 3 common metadata for the web page are page title, page description, and page keywords. These all meta tags information fetched using PHP. The page description and page keywords within tag and page title is tag.

 codeat21
 loadHTML($data); // Parse DOM to get Title data $nodes = $dom->getElementsByTagName('title'); $title = $nodes->item(0)->nodeValue; // Parse DOM to get meta data $metas = $dom->getElementsByTagName('meta'); $description = ''; $keywords = ''; $site_name = ''; $image = ''; for($i=0; $ilength; $i++)< $meta = $metas->item($i); if($meta->getAttribute('name') == 'description')< $description = $meta->getAttribute('content'); > if($meta->getAttribute('name') == 'keywords')< $keywords = $meta->getAttribute('content'); > if($meta->getAttribute('property') == 'og:site_name')< $site_name = $meta->getAttribute('content'); > if($meta->getAttribute('property') == 'og:image')< $image = $meta->getAttribute('content'); > > echo "Title: $title". '

'; echo "Description: $description". '

'; echo "Keywords: $keywords". '

'; echo "site_name: $site_name". '

'; echo "image: $image"; ?>

Output

Title: Home — codeat21.com — Web Designing and Development Tutorials Website.
Description: codeat21.com is a web designing and development tutorials website. Learn Web Development, HTML, CSS, PHP, MySQL, JavaScript, Node JS, React JS, jQuery, etc..
Keywords:
site_name: codeat21.com — Web Designing and Development Tutorials Website.
image:

Fastest way to retrieve a in PHP

Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.

EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.

I’m relatively sure that will produce an error if the pattern isn’t found. Initialise $title first, assign preg_match() to a boolean and check for that before attempting to access the first element of the $title_matches array.

Oh. Too right. If preg_match doesn’t get a result, the reference to $title_matches will barf. Will tidy up a bit.

You can get it without reg expressions:

$title = ''; $dom = new DOMDocument(); if($dom->loadHTMLFile($urlpage)) < $list = $dom->getElementsByTagName("title"); if ($list->length > 0) < $title = $list->item(0)->textContent; > >

You may want to call libxml_use_internal_errors(true); before using DOMDocument . Unfortunately, the underlying library DOMDocument uses to parse the HTML (libxml) as of today still doesn’t support HTML5 (it’s an XML library after all) and will produce warnings for HTML5 semantic tags (e.g.

or ). There doesn’t seem to be an alternative to error suppression here unfortunately. See also stackoverflow.com/a/6090728/2459834

or making this simple function slightly more bullet proof:

function page_title($url) < $page = file_get_contents($url); if (!$page) return null; $matches = array(); if (preg_match('/(.*?)/', $page, $matches)) < return $matches[1]; >else < return null; >> echo page_title('http://google.com');

Weird, read 2017 for some reasons! Either way, it’s never too late to get answers corrected since beginners might access it in the future.

I’m also doing a bookmarking system and found that since PHP 5 you can use stream_get_line to load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what’s before the opening title tag with explode (instead of a regex).

function page_title($url) < $title = false; if ($handle = fopen($url, "r")) < $string = stream_get_line($handle, 0, ""); fclose($handle); $string = (explode("", $string))[1]); > > return $title; >

Last explode thanks to PlugTrade’s answer who reminded me that title tags can have attributes.

Use cURL to get the $htmlSource variable’s contents.

preg_match('/(.*)/iU', $htmlSource, $titleMatches); print_r($titleMatches);

see what you have in that array.

Most people say for HTML traversing though you should use a parser as regexs can be unreliable.

The other answers provide more detail 🙂

In this case I think it can be safely assumed that a parser would be overkill. /agree on the non-greedy matching

I was looking for a better way to do that, but looks like most people use your proposed solution as a fast method to retrieve the title. Please consider using the ‘s’ modifier, i’ve seen weird situations where a new line breaks the regex

I like using SimpleXml with regex’s, this is from a solution I use to grab multiple link headers from a page in an OpenID library I’ve created. I’ve adapted it to work with the title (even though there is usually only one).

function getTitle($sFile) < $sData = file_get_contents($sFile); if(preg_match('/]*>.*/is', $sData, $aHead)) < $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower(''), $aHead[0]); $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml)); return (string)$xTitle->head->title; > return null; > echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');

Ironically this page has a «title tag» in the title tag which is what sometime causes problems with the pure regex solutions.

This solution is not perfect as it lowercase’s the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.

Источник

Get title of website via link

I’m trying to imitate that. For example, upon submitting the URL http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/ I want to return The Washington Times How is this possible with php?

Google news probably manages a look up table for known domains, and perhaps analyzes the HTML for unknown ones. A lookup table should be trivial to implement, so I’ve submitted an answer that does the latter.

10 Answers 10

My answer is expanding on @AI W’s answer of using the title of the page. Below is the code to accomplish what he said.

0) < $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside preg_match("/\(.*)\/i",$str,$title); // ignore case return $title[1]; > > //Example: echo get_title("http://www.washingtontimes.com/"); ?>

Washington Times — Politics, Breaking News, US and World News

As you can see, it is not exactly what Google is using, so this leads me to believe that they get a URL’s hostname and match it to their own list.

Thanks, the code works but how would you get the same main title if say the link was washingtontimes.com/news/2010/dec/3/… ? I think that’s what AI W suggested

The regex matching ought to be: preg_match(«/\(.*)\/i»,$str,$title); Some sites have the in all caps, so the check should ignore case.

Make sure to make the regex non-greedy since some websites use more than one

$doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/'); $xpath = new DOMXPath($doc); echo $xpath->query('//title')->item(0)->nodeValue."\n";

Debt commission falls short on test vote — Washington Times

Obviously you should also implement basic error handling.

@Matthew When I changed the URL to facebook.com it is showing «Update Your Browser | Facebook». Is there any solution for this?

@Enve, without looking at it, I would assume it’s because they are using a lot of Javascript to generate the page. The «Update Your Browser» is probably the default title. So you’re probably out of luck in terms of any simple solution.

Thanks! The accepted answer didn’t work for me. It just returned localhost. This answer worked for me 🙂

Using get_meta_tags() from the domain home page, for NYT brings back something which might need truncating but could be useful.

$b = "http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/" ; $url = parse_url( $b ) ; $tags = get_meta_tags( $url['scheme'].'://'.$url['host'] ); var_dump( $tags );

includes the description ‘The Washington Times delivers breaking news and commentary on the issues that affect the future of our nation.’

You could fetch the contents of the URL and do a regular expression search for the content of the title element.

(.*)/i", $urlContents, $matches); print($matches[1] . "\n"); // "Example Web Page" ?>

Or, if you don’t want to use a regular expression (to match something very near the top of the document), you could use a DOMDocument object:

loadHTML($urlContents); $title = $dom->getElementsByTagName('title'); print($title->item(0)->nodeValue . "\n"); // "Example Web Page" ?>

I leave it up to you to decide which method you like best.

Источник

Get Webpage Title and Meta Description from URL in PHP

Hi! Today we’ll see how to get webpage title and description from url in php. At times you may want to scrape some url and retrieve the webpage contents. You may need some third-party DOM Parser for this sort of task but PHP is uber-smart and provides you with some fast native solutions. Retrieving page title and meta description tags from url is lot easier than you think. Here we’ll see how to do it.

A Web Page title can be found between the tag and description within tag. Page title is self-explanatory and meta tags store various useful information about a webpage like title, description, keywords, author etc.

PHP Code to Get Webpage Title from URL:

In order to get/retrieve web page title from url, you have to use function file_get_contents() and regular expression together. Here is the php function to retrieve the contents found between tag.

]*>(.*?)/ims', $page, $match) ? $match[1] : null; return $title; > // get web page title echo 'Title: ' . getTitle('http://www.w3schools.com/php/'); // Output: // Title: PHP 5 Tutorial ?>

PHP Code to Get Webpage Meta Description from URL:

Get/Retrieve meta description from webpage is even more easier with php’s native get_meta_tags() method. The function get_meta_tags() extracts all the meta tag content attributes from any file/url and returns it as an array.

Here is the php function to get the meta description from url.

 // get web page meta description echo 'Meta Description: ' . getDescription('http://www.w3schools.com/php/'); // Output: // Meta Description: Well organized and easy to understand Web bulding tutorials with lots of examples of how to use HTML, CSS, JavaScript, SQL, PHP, and XML. ?>

Some WebPages may miss meta tags and the above function will return null value if it is not present.

Also you can retrieve the rest of the meta tags in the same way.

We have seen how to get webpage title and meta description from url in php. I hope you enjoy this tutorial. If you have any queries please let me know through your comments.

Источник

and

How to Get Title and Metadata (meta tags) from URL using PHP

Output

related keywords

Recent Posts

Fastest way to retrieve a in PHP

Get title of website via link

10 Answers 10

Get Webpage Title and Meta Description from URL in PHP

PHP Code to Get Webpage Title from URL:

PHP Code to Get Webpage Meta Description from URL: