Cleaning HTML Code From Tags with PHP

By | May 4, 2008

Often, especially when you copy text from WYSIWYG browser editors, the code is copied with tags you don’t need. In any way, if you’re working with text in PHP, you will often need to clean it from useless HTML code. Let’s see how can we do this.

If yo need to clean the entire code, you’ll need php strip_tags() function. It will clean your text from html code with some exceptions, you have to point to. In most cases this is a good solution. But sometimes you need to remove HTML code attributes, such as <span> and <p>. I’ll suggest two simple regular expression for that.

$file = eregi_replace("<span[^>]*>", "<span>", $file);
$file = eregi_replace("<p[^>]*>", "<p>", $file);

If you’re working with links you might be interested in a function, that allows to remove all attributes of <a> tag, except href. Here it is:

function fncStripAttrsExceptHREF($strText) {

$strRegExp1 = '~
<s*as+
[^>]*

hrefs*=s*
(['"]?)
(S+)
1

[^>]*
>
~ix';

$strRegExp2 = '~
on(
(dbl)?click                     |
mouse(down|up|over|move|out)    |
key(press|down|up)              |
focus                           |
blur
)
~ix'
;

return
preg_replace(
$strRegExp2,
'BAD',
preg_replace(
$strRegExp1,
'<a href="2">',
$strText
)
);

}
$file= fncStripAttrsExceptHREF($file);