Http equiv content type content text html php

Содержание

Описание мета-тегов
Описание страницы
Title
Description
Keywords
Кодировка сайта
Application-Name
Generator
Author
Copyright
Reply-To
Content-Language
Help
Управление индексацией
Robots
Last-Modified
Document-State
Revisit-After
Управление кэшированием
Cache-Control
Pragma
Expires
Canonical — предпочитаемый канонический адрес
Prev
Next
DOMDocument::loadHTML
Список параметров
Возвращаемые значения
Ошибки
Примеры
Смотрите также
User Contributed Notes 19 notes
‘ elements back (search result headers) from google search:
Consent to the use of Personal Data and Cookies
Adding Content-Type Header from PHP
Examples of Mime-Types
Character encoding in meta is not necessary

Описание мета-тегов

Посмотрев HTML код разных сайтов возникает вопрос – зачем в нем так много мета-тегов и для чего они нужны? В данной статье представлены все самые встречающеюся теги с примерами и пояснениями.

Описание страницы

Title

Заголовок страницы, оптимальная длина 50-60 символов.

Description

Краткое описание страницы длинной 160-180 символов, используется в выводе результатов поиска.

Keywords

Содержит ключевые слова встречающихся на странице. Не более 20-ти слов и 3-х повторов.

Кодировка сайта

Application-Name

Название веб-приложения. Можно указать несколько названий для разных языковых локалей.

В Android используется при добавлении сайта на главный экран.
В Windows 8, 10 при добавлении сайта в панель приложений и плиткой в меню пуск.

Generator

Сообщает, с помощью какой программы был сгенерирован код страницы.

Author

Информация об авторе сайта.

Copyright

Информация о владельце сайта.

Reply-To

Указывает способ связи с автором страницы.

Content-Language

Указывает язык страницы. Используется поисковыми машинами при индексировании.

Help

Предоставляет ссылку на справку или e-mail.

Управление индексацией

Robots

Задает правила индексации для поисковых роботов.

Общие значения:

index , follow или all	Разрешено индексировать текст и ссылки на странице
noindex	Не индексировать текст страницы
nofollow	Не переходить по ссылкам на странице
noindex , nofollow или none	Запрещено индексировать текст и переходить по ссылкам
noarchive	Не показывать ссылку на сохраненную копию в поисковой выдаче

noyaca

Не использовать описание из Яндекс.Каталога для сниппета в результатах поиска

nosnippet	Запрещает показывать видео или фрагмент текста в результатах поиска
noimageindex	Запрещает указывать вашу страницу как источник ссылки для изображения
noodp	Не использовать описание из каталога DMOZ

Last-Modified

Альтернативно HTTP-заголовку Last-Modified задает дату последнего изменения для статических страниц.

Document-State

Определяет частоту индексации для поисковых роботов.

Revisit-After

Указывает как часто обновляется информация на сайте, и как часто поисковая система должна на него заходить.

Управление кэшированием

Cache-Control

Указывает как браузеру кэшировать страницу.

Допустимые значения:

public	Кэшируется все
private	Кэшируется браузером, но не proxy-сервером
no-cache	Запрещает кэширование
max-age=0	Сколько секунд хранить в кэше

Pragma

Expires

Дата окончания кэша браузера. Если указать прошедшую дату или 0, то документ не будет кэшироваться.

Canonical — предпочитаемый канонический адрес

Если у страницы есть дубликаты с одним содержанием и разными URL, например:

В rel=»canonical» указывается адрес, который будет считаться основным и будет использоваться в поисковой выдаче.

Указывает URL предыдущий страницы при пагинации.

Указывает URL следующий страницы при пагинации.

Источник

DOMDocument::loadHTML

Функция разбирает HTML, содержащийся в строке source . В отличие от загрузки XML, HTML не должен быть правильно построенным (well-formed) документом. Эта функция также может быть вызвана статически для загрузки и создания объекта класса DOMDocument . Статический вызов может использоваться в случаях, когда нет необходимости устанавливать значения параметров объекта DOMDocument до загрузки документа.

Список параметров

Начиная с версии Libxml 2.6.0, можно также использовать параметр options для указания дополнительных параметров Libxml.

Возвращаемые значения

Возвращает true в случае успешного выполнения или false в случае возникновения ошибки. В случае статического вызова возвращает объект класса DOMDocument или false в случае возникновения ошибки.

Ошибки

Если через аргумент source передана пустая строка, будет сгенерировано предупреждение. Это предупреждение генерируется не libxml, поэтому оно не может быть обработано функциями обработки ошибок libxml.

До PHP 8.0.0 метод может вызываться статически, но вызовет ошибку E_DEPRECATED . Начиная с PHP 8.0.0, вызов этого метода статически выбрасывает исключение Error .

Несмотря на то, что некорректный HTML обычно успешно загружается, данная функция может генерировать ошибки уровня E_WARNING при обнаружении плохой разметки. Для обработки данных ошибок можно воспользоваться функциями обработки ошибок libxml.

Примеры

Пример #1 Создание документа

Смотрите также

DOMDocument::loadHTMLFile() — Загрузка HTML из файла
DOMDocument::saveHTML() — Сохраняет документ из внутреннего представления в строку, используя форматирование HTML
DOMDocument::saveHTMLFile() — Сохраняет документ из внутреннего представления в файл, используя форматирование HTML

User Contributed Notes 19 notes

You can also load HTML as UTF-8 using this simple hack:

$doc = new DOMDocument ();
$doc -> loadHTML ( » . $html );

// dirty fix
foreach ( $doc -> childNodes as $item )
if ( $item -> nodeType == XML_PI_NODE )
$doc -> removeChild ( $item ); // remove hack
$doc -> encoding = ‘UTF-8’ ; // insert proper

DOMDocument is very good at dealing with imperfect markup, but it throws warnings all over the place when it does.

This isn’t well documented here. The solution to this is to implement a separate aparatus for dealing with just these errors.

Set libxml_use_internal_errors(true) before calling loadHTML. This will prevent errors from bubbling up to your default error handler. And you can then get at them (if you desire) using other libxml error functions.

When using loadHTML() to process UTF-8 pages, you may meet the problem that the output of dom functions are not like the input. For example, if you want to get «Cạnh tranh», you will receive «Cáº¡nh tranh». I suggest we use mb_convert_encoding before load UTF-8 page :
$pageDom = new DomDocument ();
$searchPage = mb_convert_encoding ( $htmlUTF8Page , ‘HTML-ENTITIES’ , «UTF-8» );
@ $pageDom -> loadHTML ( $searchPage );

Pay attention when loading html that has a different charset than iso-8859-1. Since this method does not actively try to figure out what the html you are trying to load is encoded in (like most browsers do), you have to specify it in the html head. If, for instance, your html is in utf-8, make sure you have a meta tag in the html’s head section:

If you do not specify the charset like this, all high-ascii bytes will be html-encoded. It is not enough to set the dom document you are loading the html in to UTF-8.

Warning: This does not function well with HTML5 elements such as SVG. Most of the advice on the Web is to turn off errors in order to have it work with HTML5.

If we are loading html5 tags such as

, there is following error:

DOMDocument::loadHTML(): Tag section invalid in Entity

We can disable standard libxml errors (and enable user error handling) using libxml_use_internal_errors(true); before loadHTML();

This is quite useful in phpunit custom assertions as given in following example (if using phpunit test cases):

// Create a DOMDocument
$dom = new DOMDocument();

// fix html5/svg errors
libxml_use_internal_errors(true);

// Load html
$dom->loadHTML(» «);
$htmlNodes = $dom->getElementsByTagName(‘section’);

if ($htmlNodes->length == 0) $this->assertFalse(TRUE);
> else $this->assertTrue(TRUE);
>

Remember: If you use an HTML5 doctype and a meta element like so

your HTML code will get interpreted as ISO-8859-something and non-ASCII chars will get converted into HTML entities. However the HTML4-like version will work (as has been pointed out 10 years ago by «bigtree at 29a»):

It should be noted that when any text is provided within the body tag
outside of a containing element, the DOMDocument will encapsulate that
text into a paragraph tag (

For those of you who want to get an external URL’s class element, I have 2 usefull functions. In this example we get the ‘

‘
elements back (search result headers) from google search:

1. Check the URL (if it is reachable, existing)
# URL Check
function url_check ( $url ) <
$headers = @ get_headers ( $url );
return is_array ( $headers ) ? preg_match ( ‘/^HTTP\\/\\d+\\.\\d+\\s+2\\d\\d\\s+.*$/’ , $headers [ 0 ]) : false ;
>;
?>

2. Clean the element you want to get (remove all tags, tabs, new-lines etc.)
# Function to clean a string
function clean ( $text ) $clean = html_entity_decode ( trim ( str_replace ( ‘;’ , ‘-‘ , preg_replace ( ‘/\s+/S’ , » » , strip_tags ( $text ))))); // remove everything
return $clean ;
echo ‘\n’ ; // throw a new line
>
?>

After doing that, we can output the search result headers with following method:
$searchstring = ‘djceejay’ ;
$url = ‘http://www.google.de/webhp#q=’ . $searchstring ;
if( url_check ( $url )) $doc = new DomDocument ;
$doc -> validateOnParse = true ;
$doc -> loadHtml ( file_get_contents ( $url ));
$output = clean ( $doc -> getElementByClass ( ‘r’ )-> textContent );
echo $output . ‘
‘ ;
>else echo ‘URL not reachable!’ ; // Throw message when URL not be called
>
?>

Be aware that this function doesn’t actually understand HTML — it fixes tag-soup input using the general rules of SGML, so it creates well-formed markup, but has no idea which element contexts are allowed.

For example, with input like this where the first element isn’t closed:

loadHTML will change it to this, which is well-formed but invalid:

Источник

This website needs your consent to use cookies in order to customize ads and content.

If you give us your consent, data may be shared with Google.

Adding Content-Type Header from PHP

How to correctly output different content-types using the PHP header function.

Adding the right Content-Type header is important for PHP applications to function properly. Yet, surprisingly many still do not add the correct character encoding for the type of content they wish to deliver.

In some cases, people even choose to replace certain characters with HTML encoded alternatives, rather than learn how to properly pick and implement an appropriate character encoding.

In todays globalized world, supporting multiple languages from the beginning of new projects is generally a good practice. It can be difficult to tell when you might need to support special characters from other languages, and so, supporting UTF-8 from the start is a good idea.

For example, to support Danish letters (Æ, Ø, and Å, you can use UTF-8 in the Content-Type header field.

header('Content-Type: text/html; charset=utf-8');

To add a content-type, we can use PHP’s header function.

Examples of Mime-Types

The mime-type should be placed before the character encoding. In the above example, we simply use text/html – but there are many others! I included some commonly used ones below:

text/html	For .html pages (Note. Extensions are optional on the web).
text/plain	Plain text files. If HTML pages are delivered with this, the HTML-source will be shown, without syntax-highlighting.
image/jpeg	JPEG images. It is possible to output images with PHP as well.
image/png	PNG images. You can also output PNG images in PHP.
image/webp	WebP images. Compression is superior to PNG and JPG.
image/avif	A1/AVIF images. Compression is superior to most other formats, including WebP.
video/mp4	Video format. Useful for streaming.
text/javascript	JavaScript files. Usef for client-sided scripting.
text/css	CSS files. Used to control the styling of web pages.
application/pdf	Used to deliver .pdf files. Yes! You may also create .pdf’s in PHP!

It is important you choose the correct mime-type in order for a browser to know how to display the content. PHP does not just deliver HTML and text pages, it can also show images and video!

Character encoding in meta is not necessary

If you expect users to save your web pages locally, then you should be aware that some systems might not save the file in the correct character encoding. In these cases, you can include a meta element in your HTML, declaring the character encoding of the file:

meta http-equiv="Content-type" content="text/html; charset=utf-8">

If there is a miss-match between the Content-Type declared in your meta element and your HTTP headers, a HTML validator will show a error similar to the below:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the element (utf-8).

The solution to this problem is to always make sure both your meta, and HTTP header Content-Type match.

Your CMS should automatically do this for you, but sometimes it may be bugged, or the server might not be configured correctly. Shared hosting solutions can be very bad. It is probably best to host a server on your own, either using a cloud service provider, or on a physical server you own yourself.

Источник

Читайте также: Php read xml http

Http equiv content type content text html php

Описание мета-тегов

Описание страницы

Title

Description

Keywords

Кодировка сайта

Application-Name

Generator

Author

Copyright

Reply-To

Content-Language

Help

Управление индексацией

Robots

Last-Modified

Document-State

Revisit-After

Управление кэшированием

Cache-Control

Pragma

Expires

Canonical — предпочитаемый канонический адрес

Prev

Next

DOMDocument::loadHTML

Список параметров

Возвращаемые значения

Ошибки

Примеры

Смотрите также

User Contributed Notes 19 notes

‘ elements back (search result headers) from google search:

Consent to the use of Personal Data and Cookies

Adding Content-Type Header from PHP

Examples of Mime-Types

Character encoding in meta is not necessary

‘
elements back (search result headers) from google search: