Remove all html tags from php string
Obviously no good. I just want to strip out all html code, so I need to remove everything between < and >from the db entry THEN display the first 100 chars. Any ideas anyone?
9 Answers 9
$text = 'Test paragraph.
Other text'; echo strip_tags($text); //output Test paragraph. Other text
Why not works ? 🙁 I’m using : data = htmlentities($description2,ENT_QUOTES, ‘UTF-8’); , strip_tags($data) and not works
@delive Why in the world would you run htmlentities and then strip_tags ? That totally defeats the purpose.
For example:
$businessDesc = strip_tags($row_get_Business['business_description']); $businessDesc = substr($businessDesc, 0, 110); print($businessDesc);
This will first take the 100 characters and after that remove the html tags. But I think OP wants to first remove html tags and after that substr 100 characters.
Remove all HTML tags from PHP string with content!
Let say you have string contains anchor tag and you want to remove this tag with content then this method will helpful.
$srting = 'Some Text Lorem Ipsum is simply dummy text of the printing and typesetting industry.'; echo strip_tags_content($srting); function strip_tags_content($text) < return preg_replace('@<(\w+)\b.*?>.*?\1>@si', '', $text); >
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Almost. Section 12.1.2.2.4 of WhatWG says is valid end tag, but is not handled by the regex. Should be\1\s*> or some such.
$val = preg_replace('/<[^<]+?>/g', ' ', $row_get_Business['business_description']); $businessDesc = substr(val,0,110);
from your example should stay: Ref no: 30001
Not completely sure, but I think it won’t catch self closing tags that contain white space:
or . I also think that this doesn’t take hacks like >> into account.
This is better solution than PHP strip_tag. PHP strip_tag will remove both opening and closing HTML Script Element. However, if your user puts only the opening HTML Script Element then PHP strip_tag will not remove it. Then your web page will very likely display utterly wrong. Tested with PHP version 5.6.19. This little regex fixed those partial HTML tags that can cause problems that strip_tag will miss. Bravo!
The problem is that sometimes the user will write invalid html, so for example,
, and suing strip_tag will remove everything. and somes we want a more preserve way, so i would go with regex. «Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.»
strip_tags
Функция пытается возвратить строку, из которой удалены все NULL-байты, HTML- и PHP-теги из заданной строки ( string ). Для удаления тегов используется тот же механизм, что и в функции fgetss() .
Список параметров
Второй необязательный параметр может быть использован для указания тегов, которые не нужно удалять. Они указываются как строка ( string ) или как массив ( array ) с PHP 7.4.0. Смотрите пример ниже относительно формата этого параметра.
Замечание:
Комментарии HTML и PHP-теги также будут удалены. Это жёстко задано в коде и не может быть изменено с помощью параметра allowed_tags .
Замечание:
Самозакрывающиеся (такие как
) теги XHTML игнорируются и только не самозакрывающиеся теги должны быть использованы в allowed_tags . К примеру, для разрешения как
, так и
нужно сделать следующее:
Возвращаемые значения
Возвращает строку без тегов.
Список изменений
Версия | Описание |
---|---|
8.0.0 | allowed_tags теперь допускает значение null. |
7.4.0 | allowed_tags теперь альтернативно принимает массив ( array ). |
Примеры
Пример #1 Пример использования strip_tags()
// Начиная с PHP 7.4.0, строка выше может быть записана как:
// echo strip_tags($text, [‘p’, ‘a’]);
?>
Результат выполнения данного примера:
Примечания
Эта функция не должна использоваться для предотвращения XSS-атак. Используйте более подходящие функции для этой задачи, такие как htmlspecialchars() или другие механизмы, в зависимости от контекста вывода.
Из-за того, что strip_tags() не проверяет валидность HTML, то частичные или сломанные теги могут послужить удалением большего количества текста или данных, чем ожидалось.
Эта функция не изменяет атрибуты тегов, разрешённых с помощью allowed_tags , включая такие атрибуты как style и onmouseover , которые могут быть использованы озорными пользователями при отправке текста, отображаемого также и другим пользователям.
Замечание:
Имена тегов в HTML превышающие 1023 байта будут рассматриваться как невалидные независимо от параметра allowed_tags .
Смотрите также
User Contributed Notes 17 notes
Hi. I made a function that removes the HTML tags along with their contents:
Function:
function strip_tags_content ( $text , $tags = » , $invert = FALSE )
preg_match_all ( ‘/<(.+?)[\s]*\/?[\s]*>/si’ , trim ( $tags ), $tags );
$tags = array_unique ( $tags [ 1 ]);
if( is_array ( $tags ) AND count ( $tags ) > 0 ) <
if( $invert == FALSE ) <
return preg_replace ( ‘@<(?!(?:' . implode ( '|' , $tags ) . ')\b)(\w+)\b.*?>.*?\1>@si’ , » , $text );
>
else <
return preg_replace ( ‘@<(' . implode ( '|' , $tags ) . ')\b.*?>.*?\1>@si’ , » , $text );
>
>
elseif( $invert == FALSE ) <
return preg_replace ( ‘@<(\w+)\b.*?>.*?\1>@si’ , » , $text );
>
return $text ;
>
?>
Sample text:
$text = ‘sample text with
‘;
Result for strip_tags($text):
sample text with tags
Result for strip_tags_content($text):
text with
Result for strip_tags_content($text, ‘‘):
sample text with
Result for strip_tags_content($text, », TRUE);
text with
I hope that someone is useful 🙂
$str = ‘color is bluesize is huge
material is wood’;
notice: the words ‘blue’ and ‘size’ grow together 🙁
and line-breaks are still in new string $str
if you need a space between the words (and without line-break)
use my function:
. the result is:
$str = ‘color is blue size is huge material is wood’;
function rip_tags ( $string )
// —— remove HTML TAGs ——
$string = preg_replace ( ‘/<[^>]*>/’ , ‘ ‘ , $string );
// —— remove control characters ——
$string = str_replace ( «\r» , » , $string ); // — replace with empty space
$string = str_replace ( «\n» , ‘ ‘ , $string ); // — replace with space
$string = str_replace ( «\t» , ‘ ‘ , $string ); // — replace with space
// —— remove multiple spaces ——
$string = trim ( preg_replace ( ‘/ /’ , ‘ ‘ , $string ));
«5.3.4 strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags.»
The above seems to be saying that, since 5.3.4, if you don’t specify «
» in allowable_tags then «
» will not be stripped. but that’s not actually what they’re trying to say.
What it means is, in versions prior to 5.3.4, it «strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags», and that since 5.3.4 this is no longer the case.
So what reads as «no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)» is actually saying «no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)».
pre-5.3.4: strip_tags(‘Hello World
‘,’
‘) => ‘Hello World
‘ // strips
because it wasn’t explicitly specified in allowable_tags
5.3.4 and later: strip_tags(‘Hello World
‘,’
‘) => ‘Hello World
‘ // does not strip
because PHP matches it with
in allowable_tags
Note, strip_tags will remove anything looking like a tag — not just tags — i.e. if you have tags in attributes then they may be removed too,
A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:
Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.
Since strip_tags does not remove attributes and thus creates a potential XSS security hole, here is a small function I wrote to allow only specific tags with specific attributes and strip all other tags and attributes.
If you only allow formatting tags such as b, i, and p, and styling attributes such as class, id and style, this will strip all javascript including event triggers in formatting tags.
Note that allowing anchor tags or href attributes opens another potential security hole that this solution won’t protect against. You’ll need more comprehensive protection if you plan to allow links in your text.
function stripUnwantedTagsAndAttrs ( $html_str ) $xml = new DOMDocument ();
//Suppress warnings: proper error handling is beyond scope of example
libxml_use_internal_errors ( true );
//List the tags you want to allow here, NOTE you MUST allow html and body otherwise entire string will be cleared
$allowed_tags = array( «html» , «body» , «b» , «br» , «em» , «hr» , «i» , «li» , «ol» , «p» , «s» , «span» , «table» , «tr» , «td» , «u» , «ul» );
//List the attributes you want to allow here
$allowed_attrs = array ( «class» , «id» , «style» );
if (! strlen ( $html_str ))
if ( $xml -> loadHTML ( $html_str , LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD )) foreach ( $xml -> getElementsByTagName ( «*» ) as $tag ) if (! in_array ( $tag -> tagName , $allowed_tags )) $tag -> parentNode -> removeChild ( $tag );
>else foreach ( $tag -> attributes as $attr ) if (! in_array ( $attr -> nodeName , $allowed_attrs )) $tag -> removeAttribute ( $attr -> nodeName );
>
>
>
>
>
return $xml -> saveHTML ();
>
?>
After upgrading from v7.3.3 to v7.3.7 it appears nested «php tags» inside a string are no longer being stripped correctly by strip_tags().
This is still working in v7.3.3, v7.2 & v7.1. I’ve added a simple test below.
Note the different outputs from different versions of the same tag:
$data = ‘
Each
New
Line’ ;
$new = strip_tags ( $data , ‘
‘ );
var_dump ( $new ); // OUTPUTS string(21) «
EachNew
Line»
$data = ‘
Each
New
Line’ ;
$new = strip_tags ( $data , ‘
‘ );
var_dump ( $new ); // OUTPUTS string(16) «Each
NewLine»
$data = ‘
Each
New
Line’ ;
$new = strip_tags ( $data , ‘
‘ );
var_dump ( $new ); // OUTPUTS string(11) «EachNewLine»
?>
Features:
* allowable tags (as in strip_tags),
* optional stripping attributes of the allowable tags,
* optional comment preserving,
* deleting broken and unclosed tags and comments,
* optional callback function call for every piece processed allowing for flexible replacements.
function better_strip_tags ( $str , $allowable_tags = » , $strip_attrs = false , $preserve_comments = false , callable $callback = null ) $allowable_tags = array_map ( ‘strtolower’ , array_filter ( // lowercase
preg_split ( ‘/(?:>|^)\\s*(?: <|$)/' , $allowable_tags , - 1 , PREG_SPLIT_NO_EMPTY ), // get tag names
function( $tag ) < return preg_match ( '/^[a-z][a-z0-9_]*$/i' , $tag ); >// filter broken
) );
$comments_and_stuff = preg_split ( ‘/(|$))/’ , $str , — 1 , PREG_SPLIT_DELIM_CAPTURE );
foreach ( $comments_and_stuff as $i => $comment_or_stuff ) if ( $i % 2 ) < // html comment
if ( !( $preserve_comments && preg_match ( ‘//’ , $comment_or_stuff ) ) ) $comments_and_stuff [ $i ] = » ;
>
> else < // stuff between comments
$tags_and_text = preg_split ( «/(<(?:[^>\»‘]++|\»[^\»]*+(?:\»|$)|'[^’]*+(?:’|$))*(?:>|$))/» , $comment_or_stuff , — 1 , PREG_SPLIT_DELIM_CAPTURE );
foreach ( $tags_and_text as $j => $tag_or_text ) $is_broken = false ;
$is_allowable = true ;
$result = $tag_or_text ;
if ( $j % 2 ) < // tag
if ( preg_match ( «%^(?)([a-z][a-z0-9_]*)\\b(?:[^>\»‘/]++|/+?|\»[^\»]*\»|'[^’]*’)*?(/?>)%i» , $tag_or_text , $matches ) ) $tag = strtolower ( $matches [ 2 ] );
if ( in_array ( $tag , $allowable_tags ) ) if ( $strip_attrs ) $opening = $matches [ 1 ];
$closing = ( $opening === ‘'>‘ : $closing ;
$result = $opening . $tag . $closing ;
>
> else $is_allowable = false ;
$result = » ;
>
> else $is_broken = true ;
$result = » ;
>
> else < // text
$tag = false ;
>
if ( ! $is_broken && isset( $callback ) ) // allow result modification
call_user_func_array ( $callback , array( & $result , $tag_or_text , $tag , $is_allowable ) );
>
$tags_and_text [ $j ] = $result ;
>
$comments_and_stuff [ $i ] = implode ( » , $tags_and_text );
>
>
$str = implode ( » , $comments_and_stuff );
return $str ;
>
?>
Callback arguments:
* &$result: contains text to be placed insted of original piece (e.g. empty string for forbidden tags), it can be changed;
* $tag_or_text: original piece of text or a tag (see below);
* $tag: false for text between tags, lowercase tag name for tags;
* $is_allowable: boolean telling if a tag isn’t allowed (to avoid double checking), always true for text between tags
Callback function isn’t called for comments and broken tags.
Caution: the function doesn’t fully validate tags (the more so HTML itself), it just force strips those obviously broken (in addition to stripping forbidden tags). If you want to get valid tags then use strip_attrs option, though it doesn’t guarantee tags are balanced or used in the appropriate context. For complex logic consider using DOM parser.