简体   繁体   中英

PHP Escaped special characters to html

I have string that looks like this "v\älkommen till mig" that I get after doing utf8_encode() on the string.

I would like that string to become

 välkommen till mig

where the character

  \u00e4 = ä = ä

How can I achive this in PHP?

  • Do not use utf8_(de|en)code. It just converts from UTF8 to ISO-8859-1 and back. ISO 8859-1 does not provide the same characters as ISO-8859-15 or Windows1252, which are the most used encodings (besides UTF-8). Better use mb_convert_encoding.

  • "v\älkommen till mig" > This string looks like a JSON encoded string which IS already utf8 encoded. The unicode code positiotion of "ä" is U+00E4 >> .

Example

<?php
header('Content-Type: text/html; charset=utf-8');
$json = '"v\u00e4lkommen till mig"';
var_dump(json_decode($json)); //It will return a utf8 encoded string "välkommen till mig"

What is the source of this string?

There is no need to replace the ä with its HTML representation &auml; , if you print it in a utf8 encoded document and tell the browser the used encoding. If it is necessary, use htmlentities :

<?php
$json = '"v\u00e4lkommen till mig"';
$string = json_decode($json);
echo htmlentities($string, ENT_COMPAT, 'UTF-8');

Edit: Since you want to keep HTML characters, and I now think your source string isn't quite what you posted (I think it is actual unicode, rather than containing \\unnnn as a string), I think your best option is this:

$html = str_replace( str_replace( str_replace( htmlentities( $whatever ), '&lt;', '<' ), '&gt;', '>' ), '&amp;', '&' );

(note: no call to utf8-decode )

Original answer:

There is no direct conversion. First, decode it again:

$decoded = utf8_decode( $whatever );

then encode as HTML:

$html = htmlentities( $decoded );

and of course you can do it without a variable:

$html = htmlentities( utf8_decode( $whatever ) );

http://php.net/manual/en/function.utf8-decode.php

http://php.net/manual/en/function.htmlentities.php

To do this by regular expression (not recommended, likely slower, less reliable), you can use the fact that HTML supports &#xnnnn; constructs, where the nnnn is the same as your existing \\unnnn values. So you can say:

$html = preg_replace( '/\\\\u([0-9a-f]{4})/i', '&#x$1;', $whatever )

The html_entity_decode worked for me.

$json = '"v\u00e4lkommen till mig"';
echo $decoded = html_entity_decode( json_decode($json) );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM