简体   繁体   中英

Convert an XML UTF-8 encoded string to XML datatype in SQL Server

Converting an XML string using CAST( AS XML) works as expected in many scenarios, but fail with an error "illegal xml character" if the string contains accented chars.

This example fails with error "XML parsing: line 2, character 8, illegal xml character":

declare @Text VARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-8"?>
<ROOT>níveis porém alocação</ROOT>'

select CAST(@Text AS XML)

According to XML Specification all of them are legal XML chars, but replacing accented chars with an 'X' char will result in a sucessfull CAST:

declare @MessageText VARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-8"?>
<ROOT>nXveis porXm alocaXXo</ROOT>'

select CAST(@MessageText AS XML)

Result: <ROOT>nXveis porXm alocaXXo</ROOT>

Moreover, the same XML but UTF-16 encoded, inexplicably works:

declare @MessageText NVARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-16"?>
<ROOT>níveis porém alocação</ROOT>'

select CAST(@MessageText AS XML)

Result: <ROOT>níveis porém alocação</ROOT>

Are those chars illegal in UTF-8? Or there is a better way to convert into an XML datatype?

SQL Server strips any XML Declaration prolog internally for XML data type and uses UTF-16 encoding. Here is how to handle correctly your use case.

SQL

-- Method #1
DECLARE @Text NVARCHAR(MAX) = N'<ROOT>níveis porém alocação</ROOT>';
SELECT CAST(@Text AS XML);

-- Method #2
DECLARE @MessageText NVARCHAR(MAX) = 
'<?xml version="1.0" encoding="UTF-16"?>
<ROOT>níveis porém alocação</ROOT>';

SELECT CAST(@MessageText AS XML);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM