简体   繁体   中英

htmlentities in PHP but preserving html tags

I want to convert all texts in a string into html entities but preserving the HTML tags, for example this:

<p><font style="color:#FF0000">Camión español</font></p>

should be translated into this:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

any ideas?

You can get the list of correspondances character => entity used by htmlentities , with the function get_html_translation_table ; consider this code :

$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);

(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)

It will get you something like this :

array
  ' ' => string '&nbsp;' (length=6)
  '¡' => string '&iexcl;' (length=7)
  '¢' => string '&cent;' (length=6)
  '£' => string '&pound;' (length=7)
  '¤' => string '&curren;' (length=8)
  ....
  ....
  ....
  'ÿ' => string '&yuml;' (length=6)
  '"' => string '&quot;' (length=6)
  '<' => string '&lt;' (length=4)
  '>' => string '&gt;' (length=4)
  '&' => string '&amp;' (length=5)

Now, remove the correspondances you don't want :

unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.

And now, you just have to extract the list of keys and values :

$search = array_keys($list);
$values = array_values($list);

And, finally, you can use str_replace to do the replacement :

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);

And you get :

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>' (length=84)

Which looks like what you wanted ;-)


Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)

Second edit couple of minutes after : it seem you'll have to use utf8_encode on the $search list, before calling str_replace :-(

Which means using something like this :

$search = array_map('utf8_encode', $search);

Between the call to array_keys and the call to str_replace .

And, this time, you should really get what you wanted :

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)


And here is the full portion of code :

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);

And the full output :

string '<p><font style="color:#FF0000">Camión español</font></p>' (length=58)
string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.

Have fun !

Might not be terribly efficient, but it works

$sample = '<p><font style="color:#FF0000">Camión español</font></p>';

echo htmlspecialchars_decode(
    htmlentities($sample, ENT_NOQUOTES, 'UTF-8', false)
  , ENT_NOQUOTES
);

This is optimized version of the accepted answer.

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$string = strtr($string, $list);

No solution short of a parser is going to be correct for all cases. Yours is a good case:

<p><font style="color:#FF0000">Camión español</font></p>

but do you also want to support:

<p><font>true if 5 < a && name == "joe"</font></p>

where you want it to come out as:

<p><font>true if 5 &lt; a &amp;&amp; name == &quot;joe&quot;</font></p>

Question: Can you do the encoding BEFORE you build the HTML. In other words can do something like:

"<p><font>" + htmlentities(inner) + "</font></p>"

You'll save yourself lots of grief if you can do that. If you can't, you'll need some way to skip encoding <, >, and " (as described above), or simply encode it all, and then undo it (eg. replace('&lt;', '<') )

This is a function I've just written which solves this problem in a very elegant way:

First of all, the HTML tags will be extracted from the string, then htmlentities() is executed on every remaining substring and after that the original HTML tags will be inserted at their old position thus resulting in no alternation of the HTML tags. :-)

Have fun:

function htmlentitiesOutsideHTMLTags ($htmlText)
{
    $matches = Array();
    $sep = '###HTMLTAG###';

    preg_match_all("@<[^>]*>@", $htmlText, $matches);   
    $tmp = preg_replace("@(<[^>]*>)@", $sep, $htmlText);
    $tmp = explode($sep, $tmp);

    for ($i=0; $i<count($tmp); $i++)
        $tmp[$i] = htmlentities($tmp[$i]);

    $tmp = join($sep, $tmp);

    for ($i=0; $i<count($matches[0]); $i++)
        $tmp = preg_replace("@$sep@", $matches[0][$i], $tmp, 1);

    return $tmp;
}

Based on the answer of bflesch , I did some changes to manage string containing less than sign , greater than sign and single quote or double quotes .

function htmlentitiesOutsideHTMLTags ($htmlText, $ent)
{
    $matches = Array();
    $sep = '###HTMLTAG###';

    preg_match_all(":</{0,1}[a-z]+[^>]*>:i", $htmlText, $matches);

    $tmp = preg_replace(":</{0,1}[a-z]+[^>]*>:i", $sep, $htmlText);
    $tmp = explode($sep, $tmp);

    for ($i=0; $i<count($tmp); $i++)
        $tmp[$i] = htmlentities($tmp[$i], $ent, 'UTF-8', false);

    $tmp = join($sep, $tmp);

    for ($i=0; $i<count($matches[0]); $i++)
        $tmp = preg_replace(":$sep:", $matches[0][$i], $tmp, 1);

    return $tmp;
}



Example of use:

$string = '<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong>€</strong><img src="/some/path" /></p></div>';
$string_entities = htmlentitiesOutsideHTMLTags($string, ENT_QUOTES | ENT_HTML401);
var_dump( $string_entities );

Output is:

string '<b>Is 1 &lt; 4?</b>&egrave;<br><i>&quot;then&quot;</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>' (length=150)



You can pass any ent flag according to the htmlentities manual

one-line solution with NO translation table or custom function required:

i know this is an old question, but i recently had to import a static site into a wordpress site and had to overcome this issue:

here is my solution that does not require fiddling with translation tables:

htmlspecialchars_decode( htmlentities( html_entity_decode( $string ) ) );

when applied to the OP's string:

<p><font style="color:#FF0000">Camión español</font></p>

output:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

when applied to Luca's string:

<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong>€</strong><img src="/some/path" /></p></div>

output:

<b>Is 1 < 4?</b>&egrave;<br><i>"then"</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM