简体   繁体   English

SQL Server正则表达式清除标记

[英]SQL Server regular expressions clean tags

I have below HTML content in data: 我在数据中有以下HTML内容:

outer text <span class="cssname">inner text to be removed along with tags</span> further text

I want to remove all specific tags along with inner text <span with class='cssname' , using regular expression in a query. 我想在查询中使用正则表达式删除所有特定标签以及内部文本<span with class='cssname'

The expected output I like is: 我喜欢的预期输出是:

'outer text further text'

Regular expressions aren't fully supported in SQL Server like in other languages. SQL Server不像其他语言那样完全支持正则表达式。 This will work for a single tag. 这将适用于单个标签。

declare @var nvarchar(256) = N'outer text <span class="cssname">inner text to be removed along with tags</span> further text'

select 
    stuff(@var,charindex('<',@var),charindex('>',@var,charindex('</',@var)) - charindex('<',@var) + 1,'')

This way tweaks the HTML to create <content> elements from the regular text and casts the result as XML. 通过这种方式,可以调整HTML以从常规文本创建<content>元素,并将结果转换为XML。 This is done in the CROSS APPLY part. 这是在“ CROSS APPLY部分完成的。

The second step uses an XQuery to query the text in the <content> elements (thus stripping the <span> elements). 第二步使用XQuery查询<content>元素中的文本(从而除去<span>元素)。


DECLARE @tt TABLE(t NVARCHAR(MAX));
INSERT INTO @tt(t)VALUES(N'outer text <span class="cssname">inner text to be removed along with tags</span> further text');

SELECT
    stripped=CAST(x.query('for $i in (/content) return $i/text()') AS NVARCHAR(MAX))
FROM
    @tt
    CROSS APPLY (
        SELECT
            x=CAST('<content>'+REPLACE(REPLACE(t,'<span','</content><span'),'/span>','/span><content>')+'</content>' AS XML)
    ) AS f

Result: 结果:

outer text  further text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM