简体   繁体   中英

SQL Server regular expressions clean tags

I have below HTML content in data:

outer text <span class="cssname">inner text to be removed along with tags</span> further text

I want to remove all specific tags along with inner text <span with class='cssname' , using regular expression in a query.

The expected output I like is:

'outer text further text'

Regular expressions aren't fully supported in SQL Server like in other languages. This will work for a single tag.

declare @var nvarchar(256) = N'outer text <span class="cssname">inner text to be removed along with tags</span> further text'

select 
    stuff(@var,charindex('<',@var),charindex('>',@var,charindex('</',@var)) - charindex('<',@var) + 1,'')

This way tweaks the HTML to create <content> elements from the regular text and casts the result as XML. This is done in the CROSS APPLY part.

The second step uses an XQuery to query the text in the <content> elements (thus stripping the <span> elements).


DECLARE @tt TABLE(t NVARCHAR(MAX));
INSERT INTO @tt(t)VALUES(N'outer text <span class="cssname">inner text to be removed along with tags</span> further text');

SELECT
    stripped=CAST(x.query('for $i in (/content) return $i/text()') AS NVARCHAR(MAX))
FROM
    @tt
    CROSS APPLY (
        SELECT
            x=CAST('<content>'+REPLACE(REPLACE(t,'<span','</content><span'),'/span>','/span><content>')+'</content>' AS XML)
    ) AS f

Result:

outer text  further text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM