Php html encode to utf 8

utf8_encode

This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.

Description

This function converts the string string from the ISO-8859-1 encoding to UTF-8 .

Note:

This function does not attempt to guess the current encoding of the provided string, it assumes it is encoded as ISO-8859-1 (also known as «Latin 1») and converts to UTF-8. Since every sequence of bytes is a valid ISO-8859-1 string, this never results in an error, but will not result in a useful string if a different encoding was intended.

Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252 . Windows-1252 features additional printable characters, such as the Euro sign ( € ) and curly quotes ( “ ” ), instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

Parameters

Return Values

Returns the UTF-8 translation of string .

Changelog

Version Description
8.2.0 This function has been deprecated.
7.2.0 This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed.

Examples

Example #1 Basic example

// Convert the string ‘Zoë’ from ISO 8859-1 to UTF-8
$iso8859_1_string = «\x5A\x6F\xEB» ;
$utf8_string = utf8_encode ( $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

Notes

Note: Deprecation and alternatives

This function is deprecated as of PHP 8.2.0, and will be removed in a future version. Existing uses should be checked and replaced with appropriate alternatives.

Similar functionality can be achieved with mb_convert_encoding() , which supports ISO-8859-1 and many other character encodings.

$iso8859_1_string = «\xEB» ; // ‘ë’ (e with diaeresis) in ISO-8859-1
$utf8_string = mb_convert_encoding ( $iso8859_1_string , ‘UTF-8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$iso8859_7_string = «\xEB» ; // the same string in ISO-8859-7 represents ‘λ’ (Greek lower-case lambda)
$utf8_string = mb_convert_encoding ( $iso8859_7_string , ‘UTF-8’ , ‘ISO-8859-7’ );
echo bin2hex ( $utf8_string ), «\n» ;

$windows_1252_string = «\x80» ; // ‘€’ (Euro sign) in Windows-1252, but not in ISO-8859-1
$utf8_string = mb_convert_encoding ( $windows_1252_string , ‘UTF-8’ , ‘Windows-1252’ );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

Other options which may be available depending on the extensions installed are UConverter::transcode() and iconv() .

The following all give the same result:

$iso8859_1_string = «\x5A\x6F\xEB» ; // ‘Zoë’ in ISO-8859-1

$utf8_string = utf8_encode ( $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = mb_convert_encoding ( $iso8859_1_string , ‘UTF-8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = UConverter :: transcode ( $iso8859_1_string , ‘UTF8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = iconv ( ‘ISO-8859-1’ , ‘UTF-8’ , $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

5a6fc3ab 5a6fc3ab 5a6fc3ab 5a6fc3ab

See Also

  • utf8_decode() — Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters
  • mb_convert_encoding() — Convert a string from one character encoding to another
  • UConverter::transcode() — Convert a string from one character encoding to another
  • iconv() — Convert a string from one character encoding to another

User Contributed Notes 24 notes

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be «iso88591_to_utf8». If your text is not encoded in ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

Here’s some code that addresses the issue that Steven describes in the previous comment;

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
the UTF-8 encoding of the non-control characters that Windows-1252 places
at the equivalent code points. */

$cp1252_map = array(
«\xc2\x80» => «\xe2\x82\xac» , /* EURO SIGN */
«\xc2\x82» => «\xe2\x80\x9a» , /* SINGLE LOW-9 QUOTATION MARK */
«\xc2\x83» => «\xc6\x92» , /* LATIN SMALL LETTER F WITH HOOK */
«\xc2\x84» => «\xe2\x80\x9e» , /* DOUBLE LOW-9 QUOTATION MARK */
«\xc2\x85» => «\xe2\x80\xa6» , /* HORIZONTAL ELLIPSIS */
«\xc2\x86» => «\xe2\x80\xa0» , /* DAGGER */
«\xc2\x87» => «\xe2\x80\xa1» , /* DOUBLE DAGGER */
«\xc2\x88» => «\xcb\x86» , /* MODIFIER LETTER CIRCUMFLEX ACCENT */
«\xc2\x89» => «\xe2\x80\xb0» , /* PER MILLE SIGN */
«\xc2\x8a» => «\xc5\xa0» , /* LATIN CAPITAL LETTER S WITH CARON */
«\xc2\x8b» => «\xe2\x80\xb9» , /* SINGLE LEFT-POINTING ANGLE QUOTATION */
«\xc2\x8c» => «\xc5\x92» , /* LATIN CAPITAL LIGATURE OE */
«\xc2\x8e» => «\xc5\xbd» , /* LATIN CAPITAL LETTER Z WITH CARON */
«\xc2\x91» => «\xe2\x80\x98» , /* LEFT SINGLE QUOTATION MARK */
«\xc2\x92» => «\xe2\x80\x99» , /* RIGHT SINGLE QUOTATION MARK */
«\xc2\x93» => «\xe2\x80\x9c» , /* LEFT DOUBLE QUOTATION MARK */
«\xc2\x94» => «\xe2\x80\x9d» , /* RIGHT DOUBLE QUOTATION MARK */
«\xc2\x95» => «\xe2\x80\xa2» , /* BULLET */
«\xc2\x96» => «\xe2\x80\x93» , /* EN DASH */
«\xc2\x97» => «\xe2\x80\x94» , /* EM DASH */

«\xc2\x98» => «\xcb\x9c» , /* SMALL TILDE */
«\xc2\x99» => «\xe2\x84\xa2» , /* TRADE MARK SIGN */
«\xc2\x9a» => «\xc5\xa1» , /* LATIN SMALL LETTER S WITH CARON */
«\xc2\x9b» => «\xe2\x80\xba» , /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
«\xc2\x9c» => «\xc5\x93» , /* LATIN SMALL LIGATURE OE */
«\xc2\x9e» => «\xc5\xbe» , /* LATIN SMALL LETTER Z WITH CARON */
«\xc2\x9f» => «\xc5\xb8» /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8 ( $str ) global $cp1252_map ;
return strtr ( utf8_encode ( $str ), $cp1252_map );
>

For reference, it may be insightful to point out that:
utf8_encode($s)
is actually identical to:
recode_string(‘latin1..utf8’, $s)
and:
iconv(‘iso-8859-1’, ‘utf-8’, $s)
That is, utf8_encode is a specialized case of character set conversions.

If your string to be converted to utf-8 is something other than iso-8859-1 (such as iso-8859-2 (Polish/Croatian)), you should use recode_string() or iconv() instead rather than trying to devise complex str_replace statements.

If you haven’t guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren’t saving a bunch of . into your database.

If you need a function which converts a string array into a utf8 encoded string array then this function might be useful for you:

Источник

Encode HTML in PHP

Encode HTML in PHP

  1. Encode With htmlspecialchars()
  2. Encode With htmlentities()
  3. Encode With htmlentities() and HTML5 Encoding
  4. Encode With A Custom Method

HTML encoding is an attempt to prevent cross-site scripting XSS in PHP web applications when processing user-supplied data. This tutorial will teach you how to encode data with htmlentities() , htmlspecialchars() , and a custom method.

Encode With htmlspecialchars()

PHP htmlspecialchars() is a built-in function that can convert special characters to HTML entities. The syntax is as follows:

htmlspecialchars( $string, $flags, $encoding, $double_encode ) 
  • $string : The input string
  • $flags : The flags that dictate how the function should handle quotes in the string
  • $encoding : Specifies the encoding used by the function. This parameter is optional
  • $double_encode : A Boolean attribute that dictates if PHP will encode existing entities. If you set it to false, PHP will not encode existing entities

Like all functions, htmlspecialchars() returns a value. Its value is the converted string. But, if the function considers the string as invalid, it will return an empty string.

The next example shows how to convert a string with htmlspecialchars() . You’ll observe that the function is not used with any flags.

php  $stringToEncode = "A bold text a'nd á  tag";   $encodedString = htmlspecialchars($stringToEncode);   echo $encodedString; ?> 
A bold text a'nd á  tag 

When you view the source of the web page, you’ll observe that the apostrophe and the á characters are not encoded:

A <b>bold text</b> a'nd á <script>alert();</script> tag 

Now, if you supply a flag and encoding format to htmlspecialchars() , the apostrophe gets encoded, but á is not.

php  $stringToEncode = "A bold text a'nd á  tag";   $encodedString = htmlspecialchars($stringToEncode, ENT_QUOTES, 'UTF-8');   echo $encodedString; ?> 
A bold text a'nd á  tag 

View source of the page shows the browser encodes the apostrophe as ' :

A <b>bold text</b> a'nd á <script>alert();</script> tag 

Encode With htmlentities()

The htmlentites() is also a built-in PHP function. With htmlentities() , all applicable characters are converted to HTML entities. Its syntax is as follows:

htmlentities( $string, $flags, $encoding, $double_encode ) 
  • $string : The input string
  • $flags : The flags that dictate how the function should handle quotes in the string
  • $encoding : Specifies the encoding used by the function. This parameter is optional
  • $double_encode : A Boolean attribute that dictates if PHP will encode existing entities. If you set it to false, PHP will not encode existing entities

The return value for this function is the encoded string.

The following is an example of converting a string with htmlentities() . Here htmlentities() is not used with any flag.

php  $stringToEncode = "A bold text ánd a  tag's";   $ecodedString = htmlentities($stringToEncode);   echo $ecodedString; ?> 
A bold text ánd a  tag's 

The view source of the page shows that the function encodes the á character without any flag, but the apostrophe is not encoded.

A <b>bold text</b> ánd a <script>alert();</script> tag's 

A change to the code will allow the function to encode the apostrophe.

php  $stringToEncode = "A bold text ánd a  tag's";   $ecodedString = htmlentities($stringToEncode, ENT_QUOTES, 'UTF-8');   echo $ecodedString; ?> 
A bold text ánd a  tag's 
A <b>bold text</b> ánd a <script>alert();</script> tag's 

Encode With htmlentities() and HTML5 Encoding

When you have non-English characters in your string, you can use the HTML 5 flag and the UTF-8 encoding.

The HTML5 flag instructs the function to treat the string as HTML5, and the UTF-8 flag allows the function to understand any standard Unicode character.

The following is an example of how to use htmlentities() with an HTML5 flag and UTF-8 encoding:

php  $stringToEncode = "àéò ©€ ♣♦ ↠ ↔↛ āžšķūņ ↙ ℜ℞ ∀∂∋ rūķīš ○";   $ecodedString = htmlentities($stringToEncode, ENT_HTML5, 'UTF-8');   echo $ecodedString; ?> 

Encode With A Custom Method

If you want to roll your encoding scheme, a custom method can come in handy. This method will take your input string and apply some string manipulation. In the end, you get an encoded string.

The following HTML has a text area and a single submit button. The form action points to a file that will encode the string passed into the form input.

main>  h1>Enter and HTML code and click the submit buttonh1>  form action='encodedoutput.php' method='post'>  div class="form-row">  textarea rows='15' cols='50' name='texttoencode' required>textarea>  div>  div class="form-row">  input type='submit'>  div>  form>  main> 

The next code block is the PHP code that will perform the encoding. Save it as encodedoutput.php .

 $inputHTML = bin2hex($_POST['texttoencode']); $spiltHTML = chunk_split($inputHTML, 2 ,"%"); $HTMLStringLength = strlen($spiltHTML); $HTMLSubLength = $HTMLStringLength - 1; $HTMLSubString = substr($spiltHTML,'0', $HTMLSubLength); $encodedOutput=""; > else < echo "Not allowed"; die(); >?>  

Sample Output for alert(«Hello world»); :

 

Habdul Hazeez is a technical writer with amazing research skills. He can connect the dots, and make sense of data that are scattered across different media.

Related Article — PHP Encode

Источник

Читайте также:  Add link to footer html
Оцените статью