[英]SQL Server regular expressions clean tags
I have below HTML content in data: 我在数据中有以下HTML内容:
outer text <span class="cssname">inner text to be removed along with tags</span> further text
I want to remove all specific tags along with inner text <span with class='cssname'
, using regular expression in a query. 我想在查询中使用正则表达式删除所有特定标签以及内部文本<span with class='cssname'
。
The expected output I like is: 我喜欢的预期输出是:
'outer text further text'
Regular expressions aren't fully supported in SQL Server like in other languages. SQL Server不像其他语言那样完全支持正则表达式。 This will work for a single tag. 这将适用于单个标签。
declare @var nvarchar(256) = N'outer text <span class="cssname">inner text to be removed along with tags</span> further text'
select
stuff(@var,charindex('<',@var),charindex('>',@var,charindex('</',@var)) - charindex('<',@var) + 1,'')
This way tweaks the HTML to create <content>
elements from the regular text and casts the result as XML. 通过这种方式,可以调整HTML以从常规文本创建<content>
元素,并将结果转换为XML。 This is done in the CROSS APPLY
part. 这是在“ CROSS APPLY
部分完成的。
The second step uses an XQuery to query the text in the <content>
elements (thus stripping the <span>
elements). 第二步使用XQuery查询<content>
元素中的文本(从而除去<span>
元素)。
DECLARE @tt TABLE(t NVARCHAR(MAX));
INSERT INTO @tt(t)VALUES(N'outer text <span class="cssname">inner text to be removed along with tags</span> further text');
SELECT
stripped=CAST(x.query('for $i in (/content) return $i/text()') AS NVARCHAR(MAX))
FROM
@tt
CROSS APPLY (
SELECT
x=CAST('<content>'+REPLACE(REPLACE(t,'<span','</content><span'),'/span>','/span><content>')+'</content>' AS XML)
) AS f
Result: 结果:
outer text further text
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.