简体   繁体   中英

PHP XML UTF-8 with special characters throws errors

I am having some issues receiving UTF-8 XML files back from DHL API. As long as I don't send it's way any special characters like ś or ó, everything works just fine, but with these characters my app crashes when trying to load XML file received from DHL throwing me errors:

Warning:  DOMDocument::loadXML() [domdocument.loadxml]: 
Opening and ending tag mismatch: AddressLine line 43 and Consignee 
in Entity, line: 53 in D:\xampp\htdocs\ebay\catch2.php on line 29

Warning:  DOMDocument::loadXML() [domdocument.loadxml]: 
Opening and ending tag mismatch: Consignee line 40 and res:ShipmentValidateResponse 
in Entity, line: 97 in D:\xampp\htdocs\ebay\catch2.php on line 29

Warning:  DOMDocument::loadXML() [domdocument.loadxml]: Premature end of 
data in tag ShipmentValidateResponse line 1 in Entity, line: 98
in D:\xampp\htdocs\ebay\catch2.php on line 29

This is a XML I send

<?xml version="1.0" encoding="utf-8"?>
... 
<AddressLine>address</AddressLine> 
<AddressLine>asfśó</AddressLine> 
...

What I receive:

<?xml version="1.0" encoding="UTF-8"?>
...
Lines 40 to 43:

<Consignee>
<CompanyName>Person</CompanyName>
<AddressLine>address</AddressLine>
<AddressLine>asf??ddressLine>
...

Here is what happens around line 29:

$responseXml = $session->sendHttpRequest($requestXmlBody);
if(stristr($responseXml, 'HTTP 404') || $responseXml == '')
    die('<P>Error sending request');
$responseXml = utf8_decode($responseXml);
$responseDoc = new DOMDocument('1.0', 'UTF-8');
$responseDoc->loadXML($responseXml);

E: Removing utf8_decode doesn't help much. Just a new error:

Warning:  DOMDocument::loadXML() [domdocument.loadxml]: 
Input is not proper UTF-8, indicate encoding !
Bytes: 0xF3 0x3C 0x2F 0x41 in Entity, line: 43 in D:\xampp\htdocs\ebay\catch2.php on line 29

E2: hex dump

0000-0010:  3c 3f 78 6d-6c 20 76 65-72 73 69 6f-6e 3d 22 31  <?xml.ve rsion="1
0000-0020:  2e 30 22 20-65 6e 63 6f-64 69 6e 67-3d 22 55 54  .0".enco ding="UT
0000-0030:  46 2d 38 22-3f 3e 3c 72-65 73 3a 53-68 69 70 6d  F-8"?><r es:Shipm

line 43:

0000-0960:  4c 69 6e 65-3e 0a 20 20-20 20 20 20-20 20 3c 41  Line>... ......<A
0000-0970:  64 64 72 65-73 73 4c 69-6e 65 3e 61-73 66 3f f3  ddressLi ne>asf?.
0000-0980:  3c 2f 41 64-64 72 65 73-73 4c 69 6e-65 3e 0a 20  </Addres sLine>..

Don't use utf8_decode !

That's what's screwing up your encoding.
utf8_decode converts UTF-8 encoded text to Latin1 encoded text. That's not what you want or need. Just parse the XML as is without encoding conversion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM