[英]How can I search for a Regular Expression match that begins before a certain index of a string?
Let's say I have a regular expression 假设我有一个正则表达式
let regexString = "\\s{1,3}(---+)\\s*"
let regex = try? NSRegularExpression(pattern: regexString)
and a string 和一个字符串
let string = "Space --- the final frontier --- these are the voyages..."
and let's further assume that the string was really long and continued after the ellipses ( ...
) over several thousands of characters. 并进一步假设该字符串确实很长,并且在超过数千个字符的省略号( ...
)之后继续。
Now I want to find the first match for the regular expression regex
, but I want to stop searching after a certain index for efficiency reasons. 现在,我想找到正则表达式regex
的第一个匹配项,但是出于效率原因,我想在某个索引之后停止搜索 。
index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
string: S p a c e - - - t h e f i n a l f r o n t i e r
range: + + + + + + + + + + + + + + + ⬆︎ - - - - - - - - - - - -
max
This would mean that I only search the string for a regular expression match that starts before index 15 . 这意味着我仅在字符串中搜索索引15之前开始的正则表达式匹配。
The behavior described above is different from searching only a subrange of the string. 上述行为不同于仅搜索字符串的子范围。 Here's why: 原因如下:
The following example should produce a match at range [5–9], because the match starts before the max index (= 7). 以下示例应在[5–9]范围内产生匹配项,因为该匹配项开始于最大索引(= 7)之前。
index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
string: S p a c e - - - t h e f i n a l f r o n t i e r
range: + + + + + + + ⬆︎ - - - - - - - - - - - - - - - - - - - -
max
If I only searched a substring up to the max index (= 7), the regular expression would not be able to match because part of the match would be truncated. 如果仅搜索最大索引(= 7)以下的子字符串,则正则表达式将无法匹配,因为匹配的一部分将被截断。
index: 0 1 2 3 4 5 6 7
string: S p a c e - -
range: + + + + + + + ⬆︎
max
How can I achieve this? 我该如何实现?
Since you are using a capture group I'm assuming that is the string you are looking for. 由于您正在使用捕获组,因此我假设这是您要查找的字符串。 You can change your expression to this: ^.{0,6}\\\\s{1,3}(---+)\\\\s*
. 您可以将表达式更改为: ^.{0,6}\\\\s{1,3}(---+)\\\\s*
。 I added the following: 我添加了以下内容:
Changing the expression like this will match what you are looking for, your original expression will match if it starts at most at position 6 , this is your max . 像这样更改表达式将与您要查找的内容匹配,如果原始表达式最多从位置6开始(即为max) ,则原始表达式将匹配。 The difference is that the whole match contains those optional characters, but the first capture group will only contain the dashes you are looking for. 不同之处在于整个匹配项包含这些可选字符,但第一个捕获组将仅包含您要查找的破折号。
I use the following code at a playground to test the new expression: 我在操场上使用以下代码来测试新表达式:
let regexString = "^.{0,6}\\s{1,3}(---+)\\s*"
let regex = try? NSRegularExpression(pattern: regexString)
let string = "Space --- the final frontier --- these are the voyages of the
starship Enterprise. Its continuing mission: to explore strange
new worlds. To seek out new life and new civilizations. To boldly
go where no one has gone before!"
let matches = regex?.matches(in: string, options: [], range: NSRange(location: 0, length: string.count))
if let firstMatch = matches?.first {
print("Whole regex match starts at index: \(firstMatch.range.lowerBound)")
print("Whole match: \(String(string[Range(firstMatch.range, in: string)!]))")
print("Capture group start at index: \(firstMatch.range(at: 1).lowerBound)")
print("Capture group string: \(String(string[Range(firstMatch.range(at: 1), in: string)!]))")
} else {
print("No matches")
}
Running the code above shows the following results: 运行上面的代码将显示以下结果:
Whole regex match starts at index: 0 整个正则表达式匹配从索引0开始
Whole match: Space --- 全场比赛:太空-
Capture group start at index: 6 捕获组从索引开始:6
Capture group string: --- 捕获组字符串:-
If string
is change like this: let string = "The space --- the final frontier --- these are the ...
the result is: 如果string
是这样变化的: let string = "The space --- the final frontier --- these are the ...
结果是:
No matches 无匹配
since the \\\\s{1,3}
is starting at index 10 . 因为\\\\s{1,3}
从索引10开始。
Hope this works for you. 希望这对您有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.