简体   繁体   English

如何搜索在某个字符串的索引之前开始的正则表达式匹配?

[英]How can I search for a Regular Expression match that begins before a certain index of a string?

Let's say I have a regular expression 假设我有一个正则表达式

let regexString = "\\s{1,3}(---+)\\s*"
let regex = try? NSRegularExpression(pattern: regexString)

and a string 和一个字符串

let string = "Space --- the final frontier --- these are the voyages..."

and let's further assume that the string was really long and continued after the ellipses ( ... ) over several thousands of characters. 并进一步假设该字符串确实很长,并且在超过数千个字符的省略号( ... )之后继续。

Now I want to find the first match for the regular expression regex , but I want to stop searching after a certain index for efficiency reasons. 现在,我想找到正则表达式regex的第一个匹配项,但是出于效率原因,我想在某个索引之后停止搜索

Example: 例:

index:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
string: S  p  a  c  e     -  -  -     t  h  e     f  i  n  a  l     f  r  o  n  t  i  e  r
range:  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  ⬆︎ -  -  -  -  -  -  -  -  -  -  -  -
                                                     max 

This would mean that I only search the string for a regular expression match that starts before index 15 . 这意味着我仅在字符串中搜索索引15之前开始的正则表达式匹配。


The behavior described above is different from searching only a subrange of the string. 上述行为不同于仅搜索字符串的子范围。 Here's why: 原因如下:

✅ Should match: ✅应符合:

The following example should produce a match at range [5–9], because the match starts before the max index (= 7). 以下示例应在[5–9]范围内产生匹配项,因为该匹配项开始于最大索引(= 7)之前。

index:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
string: S  p  a  c  e     -  -  -     t  h  e     f  i  n  a  l     f  r  o  n  t  i  e  r
range:  +  +  +  +  +  +  +  ⬆︎ -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
                             max 

❎ Should, but would not match: ❎应该但不匹配:

If I only searched a substring up to the max index (= 7), the regular expression would not be able to match because part of the match would be truncated. 如果仅搜索最大索引(= 7)以下的子字符串,则正则表达式将无法匹配,因为匹配的一部分将被截断。

index:  0  1  2  3  4  5  6  7  
string: S  p  a  c  e     -  -  
range:  +  +  +  +  +  +  +  ⬆︎ 
                             max 

How can I achieve this? 我该如何实现?

Since you are using a capture group I'm assuming that is the string you are looking for. 由于您正在使用捕获组,因此我假设这是您要查找的字符串。 You can change your expression to this: ^.{0,6}\\\\s{1,3}(---+)\\\\s* . 您可以将表达式更改为: ^.{0,6}\\\\s{1,3}(---+)\\\\s* I added the following: 我添加了以下内容:

  • ^ beginning of string. ^字符串的开头。
  • .{0,6} to match from zero to six characters. 。{0,6}匹配从零到六个字符。

Changing the expression like this will match what you are looking for, your original expression will match if it starts at most at position 6 , this is your max . 像这样更改表达式将与您要查找的内容匹配,如果原始表达式最多从位置6开始(即为max) ,则原始表达式将匹配。 The difference is that the whole match contains those optional characters, but the first capture group will only contain the dashes you are looking for. 不同之处在于整个匹配项包含这些可选字符,但第一个捕获组将仅包含您要查找的破折号。

I use the following code at a playground to test the new expression: 我在操场上使用以下代码来测试新表达式:

let regexString = "^.{0,6}\\s{1,3}(---+)\\s*"
let regex = try? NSRegularExpression(pattern: regexString)
let string = "Space --- the final frontier --- these are the voyages of the     
             starship Enterprise. Its continuing mission: to explore strange 
             new worlds. To seek out new life and new civilizations. To boldly   
             go where no one has gone before!"

let matches = regex?.matches(in: string, options: [], range: NSRange(location: 0, length: string.count))
if let firstMatch = matches?.first {
    print("Whole regex match starts at index: \(firstMatch.range.lowerBound)")
    print("Whole match: \(String(string[Range(firstMatch.range, in: string)!]))")
    print("Capture group start at index: \(firstMatch.range(at: 1).lowerBound)")
    print("Capture group string: \(String(string[Range(firstMatch.range(at: 1), in: string)!]))")
} else {
    print("No matches")
}

Running the code above shows the following results: 运行上面的代码将显示以下结果:

Whole regex match starts at index: 0 整个正则表达式匹配从索引0开始

Whole match: Space --- 全场比赛:太空-

Capture group start at index: 6 捕获组从索引开始:6

Capture group string: --- 捕获组字符串:-

If string is change like this: let string = "The space --- the final frontier --- these are the ... the result is: 如果string是这样变化的: let string = "The space --- the final frontier --- these are the ...结果是:

No matches 无匹配

since the \\\\s{1,3} is starting at index 10 . 因为\\\\s{1,3}从索引10开始。

Hope this works for you. 希望这对您有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM