简体   繁体   中英

Parse text and special characters from SQL Server

I have an issue with parsing text with special characters from XML in SQL Server.

Let's say I have a XML file Sample.xml which has the following data:

<People>
    <Person FirstName="Adam"
            LastName="Smith"
            Age="44"
            Weight="178">
        <Texts>
            <Text Country="US"
                  Language="EN"
                  TextType="1">&lt;div&gt;First sentence to retrieve.&lt;/div&gt;</Text>
            <Text Country="GB"
                  Language="EN"
                  TextType="2">&lt;div&gt;Second sentence to retrieve.&lt;/div&gt;</Text>
        </Texts>
    </Person>
</People>

I prepared the following SQL script which can parse everything except two sentences in the <TextType> attribute:

  • First sentence to retrieve
  • Second sentence to retrieve
DECLARE @x XML
SELECT @x = f FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
SELECT * FROM OPENXML (@hdoc, '/People/Person/Texts/Text')
WITH (
        FirstName varchar(max) '../../@FirstName'
        , LastName varchar(max) '../../@LastName'
        , Age varchar(max) '../../@Age'
        , [Weight] varchar(max) '../../@Weight'
        , Country varchar(max) '@Country'
        , [Language] varchar(max) '@Language'
        , TextType varchar(max) '@TextType'
        )
EXEC sp_xml_removedocument @hdoc

Could you please help me to add the column with the sentences mentioned above?

OPENXML is old and basically deprecated, it has numerous issues.

You should use the newer XQuery functions .nodes and .value to retrieve your data.

Your primary issue is that you have XML stored as string inside another XML. You need to retrieve it as nvarchar(max) , then cast it using TRY_CONVERT .

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM @x.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

db<>fiddle

Note the way there are two calls to .nodes , and one feeds into the next.

You can even feed this in straight from OPENROWSET

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
CROSS APPLY (VALUES( TRY_CONVERT(xml, C.f) )) C2(AsXml)
CROSS APPLY C2.AsXml.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

I recommend you fix whatever is generating this XML, ideally you would pass through the inner XML without stringifying it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM