简体   繁体   English

php xml_parser UTF-8编码值被拆分

[英]php xml_parser UTF-8 encoded values are split

I am not sure if this is the correct behavior and how to effectively deal with it. 我不确定这是否是正确的行为以及如何有效地处理它。 I have defined an xml parser in php and it looks like this: 我已经在php中定义了一个xml解析器,它看起来像这样:

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,1);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData"); 
(....)   
function characterData($parser, $data){    print('<p>|' . $data . '|</p>');}

The input xml is like this: 输入的xml是这样的:

<?xml version="1.0" encoding="UTF-8"?>
<fields><field name="address"><value>aą</value></field></fields>

And the output looks like this: 输出看起来像这样:

|a|
|ą|

I was expecting it to look like this: 我期望它看起来像这样:

|aą|

Why does php split the UTF-8 encoded string into separate values?? php为什么将UTF-8编码的字符串拆分为单独的值?

The answer is in the documentation : 答案在文档中

It can be called multiple times inside each fragment (eg for non-ASCII strings). 可以在每个片段内多次调用它(例如,对于非ASCII字符串)。

Your code just needs to be able to handle that. 您的代码仅需要能够处理该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM