简体   繁体   中英

Response from PHP xmlrpc_server_call_method has wrong encoding

I use a PHP XMLRPC server to provide an interface to my blogging app (PDO/SQLite backend). Sending data to the database works and encoding stays intact, or at least strings with special characters such as umlauts (äöü) end up there correctly. But producing them from the database leads to problems and strings end up garbled.

Example of my setup

function get_post($id) {
    
    // get the post from the database.
    $post = load_post($id);
    // if I output content here, all characters are intact, eg. "test with ümlaut"
    
    return $post;
}

// set up a server
$server = xmlrpc_server_create();
xmlrpc_server_register_method($server, 'metaWeblog.getPost', 'get_post');

// fake a request
$request = xmlrpc_encode_request("metaWeblog.getPost", null, [
    'encoding' => 'utf-8'
]);

// call get_post()
$response = xmlrpc_server_call_method($server, $request, null, [
    'encoding' => 'utf-8'
]);

if($response) {
    header('Content-Type: text/xml; charset=utf-8');
    echo($response); // has garbled umlauts
}

Produces the wrong string test with ümlaut instead of test with ümlaut

<member>
    <name>description</name>
    <value>
        <string>test with &#195;&#188;mlaut</string>
    </value>
</member>

Is there some way I can make this work without resorting to a different XMLRPC library? Ideally, prevent the escaping of special characters entirely, if possible.
Any help is appreciated!

It actually works exactly like you ask for, only the escaping output option is missing, you want markup here (takes your Unicode UTF-8 string more or less verbatim) without any other value:

$response = xmlrpc_server_call_method($server, $request, null, [
    'encoding' => 'UTF-8',
    'escaping' => 'markup',
]);

The encoding (as it will end up in the XML declaration) sets the document encoding declaration to UTF-8 .

And with markup the plain UTF-8 string is taken and only XML markup characters ( < , > , & , etc.) are escaped. This is contrary to the default which would also escape non-ascii and non-print (able) characters as numeric entities ( &#195;&#188 ), those which are not helpful here as you want characters that are not specifically ASCII in their original encoding. UTF-8 albeit compatible with ASCII for the subset of the first 127 code-points, uses non-ASCII characters for flagging continuation bytes with the highest bit set, so those bytes are always higher than 127.

<?xml version="1.0" encoding="UTF-8"?>
  ...
    <member>
     <name>description</name>
     <value>
      <string>Äpfel wachsen überirdisch.</string>
     </value>
    </member>
  ...

The escaping output-option

You can find those options documented on the xmlrpc_encode_request(php) manual page , as it is a bit brief, some discussion here in context of the answer:

The escaping output-option can take a string with a single value or an array with multiple string values.

The default is ['non-ascii', 'non-print', 'markup'] , a fourth value, 'cdata' , is available as well:

  1. 'non-ascii' : every code-point higher than 127 (excluding) is escaped as numeric entity ( XML: Character Reference ); eg the UTF-8 ü (u-umlaut) as &#195;&#188; .
  2. 'non-print' : Every non-printable character is escaped as numeric entity . Compare RFC20. Not printable characters are all non-graphic characters, that is space (2/0, decimal 32) is printable and everything higher than 127 is not printable. Therefore, to preserve UTF-8 byte sequences, same as with 'non-ascii' , it must not be set.
  3. 'markup' : suggested in the answer, the explanation above is likely falling short, for the details compare with XML: Markup ;
  4. 'cdata' : This puts strings into XML: CDATA Sections . Not suggested in the answer but preserves UTF-8 as well and can be a fine escaping when the data contains strings that are XML, HTML or some other source code, like PHP, as well, as the response data then is easier to read by humans.

Mind the NUL byte

In XML-RPC, due to the fact it is XML based and likely implementation defined as well in the underlying C library, NUL bytes terminate a string.

If there is need to retain it, as XML itself does not have a character reference for it, encode it as base64 then (see RFC4648, a proposed standard, and xmlrpc_set_type(php) ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM