简体   繁体   中英

Character entered by user is breaking xml decoding done by xml_parse_into_struct

thanks for answering! This is about PHP/MySQL

The user enters some text that is then processed through htmlentities() :

$new_userinput = htmlentities($userinput, ENT_QUOTES);

This entry is stored in an XML:

...
<entrylist>
    <list>$new_userinput</list>
    <info>$someinfo</info>
</entrylist>
...

The xml file is stored in a database in UTF-8 format. The HTML for the site is also set with UTF-8.

What we observed is with a specific input, the xml being processed by:

$p = xml_parser_create();
xml_parse_into_struct($p, $xmlentry, $values, $index);
xml_parser_free($p);`

is not processed properly by the xml_parse_into_struct() .

What we see in the database is the following:

...
<note>Creatives share shots&acirc;€”small screenshots.</note>
...

You need to specify the charset in htmlentities() , eg

$new_userinput = htmlentities($userinput, ENT_QUOTES, 'UTF-8');

To illustrate

echo htmlentities("€", ENT_QUOTES); // &acirc;?&not;

echo htmlentities("€", ENT_QUOTES, "UTF-8"); // &euro;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM