[英]Regex capture groups (lookbehind?)
I have a string, which can contain 10 or more characters ([0-9a-zA-Z]), eg: abcdefghij12345我有一个字符串,可以包含10个或更多字符([0-9a-zA-Z]),例如:abcdefghij12345
I want to catch following characters in groups:我想在组中捕获以下字符:
Group 1-3 works, but how can a get postion "6 - last postion of string" in Group 4?第 1-3 组有效,但如何在第 4 组中获得位置“6 - 字符串的最后位置”?
What I already have?我已经拥有了什么?
r'^([0-9a-zA-Z]{2})([0-9a-zA-Z]{2})([0-9a-zA-Z]{6})'
I expect to get all four groups with one Regex expression.我希望所有四个组都具有一个 Regex 表达式。 How to expand my expression to get additionally group 4?
如何扩展我的表达以获得额外的第 4 组?
Edit: Additionally following Regex is needed for a string of 72 and more characters编辑:对于 72 个或更多字符的字符串,还需要遵循正则表达式
I want to catch following characters in groups:我想在组中捕获以下字符:
Group 1: Character position "1 and 2"第 1 组:字符 position“1 和 2”
Group 2: Character position "3 and 4"第 2 组:字符 position“3 和 4”
Group 3: Character position "5 and 6"...第 3 组:字符 position“5 和 6”...
Group 16: Character position "31 and 32"第 16 组:字符 position “31 和 32”
Group 17: Character position "33 - 40"第 17 组:字符 position “33 - 40”
Group 18: Character position "41 and 42"第 18 组:字符 position “41 和 42”
Group 19: Character position "33 - 40"第 19 组:字符 position “33 - 40”
Group 20: Character position "12 - Last position of string"第 20 组:字符 position “12 - 字符串的最后一个 position”
String (72 char): 294592522929354526532268626626426854242342362676256672666267626726672667字符串(72 个字符):294592522929354526532268626626426854242342362676256672666267626726672667
r'^([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{8})([\da-zA-Z]{2})([\da-zA-Z]{8})'
You could use a positive lookahead:您可以使用积极的前瞻:
^([\da-zA-Z]{2})([\da-zA-Z]{2})(?=([\da-zA-Z]{6})).([\da-zA-Z].*)$
^
- start of line anchor ^
- 行锚点的开始([\da-zA-Z]{2})
- first capture group, pos 1-2 ([\da-zA-Z]{2})
- 第一个捕获组,pos 1-2([\da-zA-Z]{2})
- second capture group, pos 3-4 ([\da-zA-Z]{2})
- 第二个捕获组,pos 3-4(?=([\da-zA-Z]{6}))
- positive lookahead, third capture, pos 5-10 (?=([\da-zA-Z]{6}))
- 正面前瞻,第三次捕获,pos 5-10.([\da-zA-Z].*)
- discard one character and capture the rest as forth capture, pos 6-end .([\da-zA-Z].*)
- 丢弃一个字符并捕获 rest 作为第四个捕获,pos 6-end$
- end of line anchor $
- 行尾锚点 Since it's an index/position issue, why not just using classical slicing with a tuple-comp ?既然这是一个索引/位置问题,为什么不直接使用带有tuple-comp的经典切片呢?
S = "abcdefghij12345"
g1, g2, g3, g4 = (S[i:j] for i, j in [(0, 2), (2, 4), (4, 10), (5, None)])
Output: Output:
ab # <- group1
cd # <- group2
efghij # <- group3
fghij12345 # <- group4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.