简体   繁体   English

从 SQL 中的字符串中提取 substring

[英]Extract substring from string in SQL

I need to extract a text that is surrounded by ***[some text] strings, like in the following example:我需要提取由***[some text]字符串包围的文本,如下例所示:

some text
some text
***[some text]
THIS SHOULD BE EXTRACTED
***[some text]
some text
some text
some text
some text
some text
***[some text]
THIS SHOULD BE EXTRACTED TOO
***[some text]
some text

the output should be: output 应该是:

THIS SHOULD BE EXTRACTED
THIS SHOULD BE EXTRACTED TOO

I tried PATINDEX like here, but couln't find the way to extract the string.我在这里尝试了PATINDEX ,但找不到提取字符串的方法。

PATINDEX('%[*][*][*][[]%]%%[*][*][*][[]%]%',@Text)

I am looking forward to hearing any suggestions.我期待听到任何建议。

For the somewhat easier case raised in the comments you could do对于评论中提出的更简单的情况,您可以这样做

;WITH T(C) AS
(
 SELECT '
    some text
    some text
    ***[some text 1]
    THIS SHOULD BE EXTRACTED
    ***[some text 2]
    some text
    some text
    some text
    some text
    some text
    ***[some text 1]
    THIS SHOULD BE EXTRACTED TOO
    ***[some text 2]
    some text'
)
SELECT col.value('.','varchar(max)')
FROM T
CROSS APPLY (SELECT CAST('<a keep="false">' + 
                        REPLACE(
                            REPLACE(C,'***[some text 2]','</a><a keep="false">'),
                        '***[some text 1]','</a><a keep="true">') + 
                    '</a>' AS xml) as xcol) x
CROSS APPLY xcol.nodes('/a[@keep="true"]') tab(col)

I may be wrong but I don't think there's a clean way to do this directly in SQL.我可能是错的,但我认为没有直接在 SQL 中执行此操作的干净方法。 I would use a CLR stored procedure and use regular expressions from C# or your .NET language of choice.我将使用CLR 存储过程并使用 C# 或您选择的 .NET 语言中的正则表达式。

See this article (or this article ) for a relevant example using regexes.有关使用正则表达式的相关示例,请参阅本文(或本文)。

You can find this in my blog: http://sql-tricks.blogspot.com/2011/04/extract-strings-with-delimiters.html It is pure solution with no additional modification, only delimiters sequences should be decalred.你可以在我的博客中找到这个: http://sql-tricks.blogspot.com/2011/04/extract-strings-with-delimiters.html这是纯粹的解决方案,没有额外的修改,只有分隔符序列应该被标记。

Not a regex solution and I'm still a SQL novice so may not be optimal but you should be able to parse with a WHILE loop using不是正则表达式解决方案,我仍然是 SQL 新手,所以可能不是最佳的,但您应该能够使用WHILE循环进行解析

CHARINDEX for the *** then using that as a starting point to CHARINDEX***然后使用它作为起点
CHARINDEX to the LF Use that as the starting point for a CHARINDEXLF使用它作为一个起点
SUBSTRING with the ending point being a CHARINDEX of the next *** SUBSTRING结束点是下一个***CHARINDEX
concatenate the Substring to your output, move past the ending *** and loop to find the next one.将 Substring 连接到您的 output,移过结尾的***并循环查找下一个。

I'll play with it some and see if I can add an example.我会玩一些,看看我是否可以添加一个例子。
EDIT - This probably needs more error checking编辑 - 这可能需要更多的错误检查

declare @inText nvarchar(2000) = 'some text 
some text 
***[some text] 
THIS SHOULD BE EXTRACTED 
***[some text] 
some text 
some text 
some text 
some text 
some text 
***[some text] 
THIS SHOULD BE EXTRACTED TOO 
***[some text] 
some text '

declare @delim1 nvarchar(50) = '***'
declare @delim2 char = char(10)
declare @output nvarchar(1000) = ''
declare @position int
declare @positionEnd int

set @position = CHARINDEX(@delim1,@inText)
while (@position != 0 and @position is not null)
BEGIN
  set @position = CHARINDEX(@delim2,@inText,@position)
  set @positionEnd = CHARINDEX(@delim1,@inText,@position)
  set @output = @output + SUBSTRING(@inText,@position,@positionEnd-@position)
  set @position = CHARINDEX(@delim1,@inText,@positionEnd+LEN(@delim1))
END
select @output

I believe you can use the xp_regex_match as described in http://www.codeproject.com/KB/mcpp/xpregex.aspx?q=use+sql+function+to+parse+text to parse your nvarchar field.我相信您可以使用http://www.codeproject.com/KB/mcpp/xpregex.aspx?q=use+sql+function+to+parse+text中所述的 xp_regex_match 来解析您的 nvarchar 字段。 I wrote something similar quite a while back.我很久以前写过类似的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM