METANIT.COM

htmlentities

This function is identical to htmlspecialchars() in all ways, except with htmlentities() , all characters which have HTML character entity equivalents are translated into these entities. The get_html_translation_table() function can be used to return the translation table used dependent upon the provided flags constants.

If you want to decode instead (the reverse) you can use html_entity_decode() .

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_IGNORE Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWED Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.
Читайте также:  Php standard exception classes

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets
Charset Aliases Description
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8 ASCII compatible multi-byte 8-bit Unicode.
cp866 ibm866, 866 DOS-specific Cyrillic charset.
cp1251 Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
cp1252 Windows-1252, 1252 Windows specific charset for Western European.
KOI8-R koi8-ru, koi8r Russian.
BIG5 950 Traditional Chinese, mainly used in Taiwan.
GB2312 936 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS SJIS, SJIS-win, cp932, 932 Japanese
EUC-JP EUCJP, eucJP-win Japanese
MacRoman Charset that was used by Mac OS.
» An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

Return Values

Returns the encoded string.

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Changelog

Version Description
8.1.0 flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .
8.0.0 encoding is nullable now.

Examples

Example #1 A htmlentities() example

// Outputs: A ‘quote’ is <b>bold</b>
echo htmlentities ( $str );

// Outputs: A 'quote' is <b>bold</b>
echo htmlentities ( $str , ENT_QUOTES );
?>

Example #2 Usage of ENT_IGNORE

// Outputs an empty string
echo htmlentities ( $str , ENT_QUOTES , «UTF-8» );

// Outputs «. »
echo htmlentities ( $str , ENT_QUOTES | ENT_IGNORE , «UTF-8» );
?>

See Also

  • html_entity_decode() — Convert HTML entities to their corresponding characters
  • get_html_translation_table() — Returns the translation table used by htmlspecialchars and htmlentities
  • htmlspecialchars() — Convert special characters to HTML entities
  • nl2br() — Inserts HTML line breaks before all newlines in a string
  • urlencode() — URL-encodes string

User Contributed Notes 22 notes

An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.

When printing user input in an attribute of an HTML tag, the default configuration of htmlEntities() doesn’t protect you against XSS, when using single quotes to define the border of the tag’s attribute-value. XSS is then possible by injecting a single quote:

$_GET [ ‘a’ ] = «#000′ onload=’alert(document.cookie)» ;
?>

XSS possible (insecure):

$href = htmlEntities ( $_GET [ ‘a’ ]);
print «» ; # results in:
?>

Use the ‘ENT_QUOTES’ quote style option, to ensure no XSS is possible and your application is secure:

$href = htmlEntities ( $_GET [ ‘a’ ], ENT_QUOTES );
print «» ; # results in:
?>

The ‘ENT_QUOTES’ option doesn’t protect you against javascript evaluation in certain tag’s attributes, like the ‘href’ attribute of the ‘a’ tag. When clicked on the link below, the given JavaScript will get executed:

I’ve seen lots of functions to convert all the entities, but I needed to do a fulltext search in a db field that had named entities instead of numeric entities (edited by tinymce), so I searched the tinymce source and found a string with the value->entity mapping. So, i wrote the following function to encode the user’s query with named entities.

The string I used is different of the original, because i didn’t want to convert ‘ or «. The string is too long, so I had to cut it. To get the original check TinyMCE source and search for nbsp or other entity 😉

$entities_unmatched = explode ( ‘,’ , ‘160,nbsp,161,iexcl,162,cent, [. ] ‘ );
$even = 1 ;
foreach( $entities_unmatched as $c ) if( $even ) $ord = $c ;
> else $entities_table [ $ord ] = $c ;
>
$even = 1 — $even ;
>

function encode_named_entities ( $str ) global $entities_table ;

$encoded_str = » ;
for( $i = 0 ; $i < strlen ( $str ); $i ++) $ent = @ $entities_table [ ord ( $str < $i >)];
if( $ent ) $encoded_str .= «& $ent ;» ;
> else $encoded_str .= $str < $i >;
>
>
return $encoded_str ;
>

If you are building a loadvars page for Flash and have problems with special chars such as » & «, » ‘ » etc, you should escape them for flash:

Try trace(escape(«&»)); in flash’ actionscript to see the escape code for &;

function flashentities ( $string )<
return str_replace (array( «&» , «‘» ),array( «%26» , «%27» ), $string );
>
?>

Those are the two that concerned me. YMMV.

The flag ENT_HTML5 also strips newline chars like \n with htmlentities while htmlspecialchars is not affected by that.

If you want to use nl2br on that string afterwards you might end up searching the problem like i did. This does not apply to other flags like e.g. ENT_XHTML which confused me.

Tested this with PHP 5.4 / 5.5 / 5.6-dev with same results, so it seems that this is an intended «feature».

For those Spanish (and not only) folks, that want their national letters back after htmlentities 🙂

protected function _decodeAccented ( $encodedValue , $options = array()) $options += array(
‘quote’ => ENT_NOQUOTES ,
‘encoding’ => ‘UTF-8’ ,
);
return preg_replace_callback (
‘/&\w(acute|uml|tilde);/’ ,
create_function (
‘$m’ ,
‘return html_entity_decode($m[0], ‘ . $options [ ‘quote’ ] . ‘, «‘ .
$options [ ‘encoding’ ] . ‘»);’
),
$encodedValue
);
>
?>

The following will make a string completely safe for XML:

function philsXMLClean ( $strin ) $strout = null ;

Источник

Php экранирование тегов html

Большое значение в PHP имеет организация безопасности данных. Рассмотрим несколько простых механизмов, которые могут повысить безопасность нашего веб-сайта.

Но вначале возьмем форму из прошлой темы:

      if(isset($_POST["age"])) < $age = $_POST["age"]; >echo "Имя: $name 
Возраст: $age"; ?>

Форма ввода данных

Имя:

Возраст:

И попробуем ввести в нее некоторые данные. Например, введем в поле для имени ««:

Безопасность в PHP

После отправки данных в html разметку будет внедрен код javascript, который выводит окно с сообщением.

Это относительно простой и безвредный скрипт. Однако внедряемый код может быть более вредоносным. И чтобы избежать подобных проблем с безопасностью, рекомендуется применять функцию htmlentities() . В качестве параметра она принимает значение, которое надо экранировать:

$name = "не определено"; $age = "не определен"; if(isset($_POST["name"])) < $name = htmlentities($_POST["name"]); >if(isset($_POST["age"])) < $age = htmlentities($_POST["age"]); >echo "Имя: $name 
Возраст: $age";

И даже после ввода кода html или javascript все теги будут экранированы, и мы получим следующий вывод:

Функция htmlentities в PHP

Еще одна специальная функция — htmlspecialchars() похожа по действию на htmlentities :

$name = "не определено"; $age = "не определен"; if(isset($_POST["name"])) < $name = htmlspecialchars($_POST["name"]); >if(isset($_POST["age"])) < $age = htmlspecialchars($_POST["age"]); >echo "Имя: $name 
Возраст: $age";

Еще одна функция — функция strip_tags() позволяет полностью исключить теги html:

$name = "не определено"; $age = "не определен"; if(isset($_POST["name"])) < strip_tags($_POST["name"]); >if(isset($_POST["age"])) < strip_tags($_POST["age"]); >echo "Имя: $name 
Возраст: $age";

Результатом ее работы при том же вводе будет следующий вывод:

Источник

PHP htmlspecialchars() Function

Convert the predefined characters «» (greater than) to HTML entities:

The HTML output of the code above will be (View Source):

The browser output of the code above will be:

Definition and Usage

The htmlspecialchars() function converts some predefined characters to HTML entities.

The predefined characters are:

  • & (ampersand) becomes &
  • » (double quote) becomes "
  • ‘ (single quote) becomes '
  • < (less than) becomes <
  • > (greater than) becomes >

Tip: To convert special HTML entities back to characters, use the htmlspecialchars_decode() function.

Syntax

Parameter Values

Parameter Description
string Required. Specifies the string to convert
flags Optional. Specifies how to handle quotes, invalid encoding and the used document type.

The available quote styles are:

  • ENT_COMPAT — Default. Encodes only double quotes
  • ENT_QUOTES — Encodes double and single quotes
  • ENT_NOQUOTES — Does not encode any quotes
  • ENT_IGNORE — Ignores invalid encoding instead of having the function return an empty string. Should be avoided, as it may have security implications.
  • ENT_SUBSTITUTE — Replaces invalid encoding for a specified character set with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; instead of returning an empty string.
  • ENT_DISALLOWED — Replaces code points that are invalid in the specified doctype with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;

Additional flags for specifying the used doctype:

  • ENT_HTML401 — Default. Handle code as HTML 4.01
  • ENT_HTML5 — Handle code as HTML 5
  • ENT_XML1 — Handle code as XML 1
  • ENT_XHTML — Handle code as XHTML
  • UTF-8 — Default. ASCII compatible multi-byte 8-bit Unicode
  • ISO-8859-1 — Western European
  • ISO-8859-15 — Western European (adds the Euro sign + French and Finnish letters missing in ISO-8859-1)
  • cp866 — DOS-specific Cyrillic charset
  • cp1251 — Windows-specific Cyrillic charset
  • cp1252 — Windows specific charset for Western European
  • KOI8-R — Russian
  • BIG5 — Traditional Chinese, mainly used in Taiwan
  • GB2312 — Simplified Chinese, national standard character set
  • BIG5-HKSCS — Big5 with Hong Kong extensions
  • Shift_JIS — Japanese
  • EUC-JP — Japanese
  • MacRoman — Character-set that was used by Mac OS

Note: Unrecognized character-sets will be ignored and replaced by ISO-8859-1 in versions prior to PHP 5.4. As of PHP 5.4, it will be ignored an replaced by UTF-8.

Technical Details

More Examples

Example

Convert some predefined characters to HTML entities:

$str = «Jane & ‘Tarzan'»;
echo htmlspecialchars($str, ENT_COMPAT); // Will only convert double quotes
echo «
«;
echo htmlspecialchars($str, ENT_QUOTES); // Converts double and single quotes
echo «
«;
echo htmlspecialchars($str, ENT_NOQUOTES); // Does not convert any quotes
?>

The HTML output of the code above will be (View Source):

The browser output of the code above will be:

Example

Convert double quotes to HTML entities:

$str = ‘I love «PHP».’;
echo htmlspecialchars($str, ENT_QUOTES); // Converts double and single quotes
?>

The HTML output of the code above will be (View Source):

The browser output of the code above will be:

Источник

Оцените статью