Php html code as text

html_entity_decode

html_entity_decode() is the opposite of htmlentities() in that it converts HTML entities in the string to their corresponding characters.

More precisely, this function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

Читайте также:  Mutable type in python

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets
Charset Aliases Description
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8 ASCII compatible multi-byte 8-bit Unicode.
cp866 ibm866, 866 DOS-specific Cyrillic charset.
cp1251 Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
cp1252 Windows-1252, 1252 Windows specific charset for Western European.
KOI8-R koi8-ru, koi8r Russian.
BIG5 950 Traditional Chinese, mainly used in Taiwan.
GB2312 936 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS SJIS, SJIS-win, cp932, 932 Japanese
EUC-JP EUCJP, eucJP-win Japanese
MacRoman Charset that was used by Mac OS.
» An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

Return Values

Returns the decoded string.

Источник

htmlspecialchars_decode

This function is the opposite of htmlspecialchars() . It converts special HTML entities back to characters.

The converted entities are: & , " (when ENT_NOQUOTES is not set), ' (when ENT_QUOTES is set), < and > .

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

Return Values

Returns the decoded string.

Changelog

Version Description
8.1.0 flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A PHP component to convert HTML into a plain text format

License

soundasleep/html2text

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Total Downloads

html2text is a very simple script that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser — perfect for places where you need a quick text representation. For example:

html> title>Ignored Titletitle> body> h1>Hello, World!h1> p>This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. p>Even mismatched tags.p> div>A divdiv> div>Another divdiv> div>A divdiv>within a divdiv>div> a href pl-s">http://foo.com">A linka> body> html>
Hello, World! This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. Even mismatched tags. A div Another div A div within a div [A link](http://foo.com) 

You can use Composer to add the package to your project:

< "require": < "soundasleep/html2text": "~1.1" > >

And then use it quite simply:

$text = \Soundasleep\Html2Text::convert($html);

You can also include the supplied html2text.php and use $text = convert_html_to_text($html); instead.

Option Default Description
ignore_errors false Set to true to ignore any XML parsing errors.
drop_links false Set to true to not render links as [http://foo.com](My Link) , but rather just My Link .
char_set ‘auto’ Specify a specific character set. Pass multiple character sets (comma separated) to detect encoding, default is ASCII,UTF-8

Pass along options as a second argument to convert , for example:

$options = array( 'ignore_errors' => true, // other options go here ); $text = \Soundasleep\Html2Text::convert($html, $options);

Some very basic tests are provided in the tests/ directory. Run them with composer install && vendor/bin/phpunit .

Class ‘DOMDocument’ not found

You need to install the PHP XML extension for your PHP version. e.g. apt-get install php7.4-xml

html2text is licensed under MIT, making it suitable for both Eclipse and GPL projects.

Also see html2text_ruby, a Ruby implementation.

About

A PHP component to convert HTML into a plain text format

Источник

Оцените статью