How can UTF-8 strings (ie 8-bit string) be converted to/from XML-compatible 7-bit strings (ie printable ASCII with numeric entities)?
ie an encode()
function such that:
encode("“£”") -> "“£”"
decode()
would also be useful:
decode("“£”") -> "“£”"
PHP's htmlenties()
/ html_entity_decode()
pair does not do the right thing:
htmlentities(html_entity_decode("“£”")) ->
"“£”"
Laboriously specifying types helps a little, but still returns XML-incompatible named entities, not numeric ones:
htmlentities(html_entity_decode("“£”", ENT_QUOTES, "UTF-8"), ENT_QUOTES, "UTF-8") ->
"“£”"
It's a bit of a workaround, but I read a bit about iconv()
and i don't think it'll give you numeric entities (not put to the test)
function decode( $string )
{
$doc = new DOMDocument( "1.0", "UTF-8" );
$doc->LoadXML( '<?xml version="1.0" encoding="UTF-8"?>'."\n".'<x />', LIBXML_NOENT );
$doc->documentElement->appendChild( $doc->createTextNode( $string ) );
$output = $doc->saveXML( $doc );
$output = preg_replace( '/<\?([^>]+)\?>/', '', $output );
$output = str_replace( array( '<x>', '</x>' ), array( '', '' ), $output );
return trim( $output );
}
This however, I have put to the test. I might do the reverse later, just don't hold your breath ;-)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.