简体   繁体   English

从 SQL 服务器解析文本和特殊字符

[英]Parse text and special characters from SQL Server

I have an issue with parsing text with special characters from XML in SQL Server.我在使用 SQL 服务器中的 XML 中的特殊字符解析文本时遇到问题。

Let's say I have a XML file Sample.xml which has the following data:假设我有一个 XML 文件Sample.xml ,其中包含以下数据:

<People>
    <Person FirstName="Adam"
            LastName="Smith"
            Age="44"
            Weight="178">
        <Texts>
            <Text Country="US"
                  Language="EN"
                  TextType="1">&lt;div&gt;First sentence to retrieve.&lt;/div&gt;</Text>
            <Text Country="GB"
                  Language="EN"
                  TextType="2">&lt;div&gt;Second sentence to retrieve.&lt;/div&gt;</Text>
        </Texts>
    </Person>
</People>

I prepared the following SQL script which can parse everything except two sentences in the <TextType> attribute:我准备了以下 SQL 脚本,它可以解析除了<TextType>属性中的两个句子之外的所有内容:

  • First sentence to retrieve要检索的第一句话
  • Second sentence to retrieve第二句检索
DECLARE @x XML
SELECT @x = f FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
SELECT * FROM OPENXML (@hdoc, '/People/Person/Texts/Text')
WITH (
        FirstName varchar(max) '../../@FirstName'
        , LastName varchar(max) '../../@LastName'
        , Age varchar(max) '../../@Age'
        , [Weight] varchar(max) '../../@Weight'
        , Country varchar(max) '@Country'
        , [Language] varchar(max) '@Language'
        , TextType varchar(max) '@TextType'
        )
EXEC sp_xml_removedocument @hdoc

Could you please help me to add the column with the sentences mentioned above?你能帮我添加上面提到的句子吗?

OPENXML is old and basically deprecated, it has numerous issues. OPENXML是旧的并且基本上已被弃用,它有很多问题。

You should use the newer XQuery functions .nodes and .value to retrieve your data.您应该使用较新的 XQuery 函数.nodes.value来检索数据。

Your primary issue is that you have XML stored as string inside another XML.您的主要问题是您将 XML 作为字符串存储在另一个 XML 中。 You need to retrieve it as nvarchar(max) , then cast it using TRY_CONVERT .您需要将其检索为nvarchar(max) ,然后使用TRY_CONVERT进行转换。

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM @x.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

db<>fiddle db<>小提琴

Note the way there are two calls to .nodes , and one feeds into the next.请注意对.nodes的两次调用的方式,一个馈入下一个。

You can even feed this in straight from OPENROWSET您甚至可以直接从OPENROWSET输入

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
CROSS APPLY (VALUES( TRY_CONVERT(xml, C.f) )) C2(AsXml)
CROSS APPLY C2.AsXml.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

I recommend you fix whatever is generating this XML, ideally you would pass through the inner XML without stringifying it.我建议您修复生成此 XML 的任何内容,理想情况下,您将通过内部 XML 而不对其进行字符串化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM