正则表达式捕获组（回顾？）

Question

I have a string, which can contain 10 or more characters ([0-9a-zA-Z]), eg: abcdefghij12345我有一个字符串，可以包含10个或更多字符（[0-9a-zA-Z]），例如：abcdefghij12345

I want to catch following characters in groups:我想在组中捕获以下字符：

Group 1: Character position "1 and 2": ab第 1 组：字符 position “1 和 2”：ab
Group 2: Character position "3 and 4": cd第 2 组：字符 position “3 和 4”：cd
Group 3: Character position "5 - 10": efghij第 3 组：字符 position“5 - 10”：efghij
Group 4: Character position "6 - Last position of string": fghij12345第 4 组：字符 position“6 - 字符串的最后一个 position”：fghij12345

Group 1-3 works, but how can a get postion "6 - last postion of string" in Group 4?第 1-3 组有效，但如何在第 4 组中获得位置“6 - 字符串的最后位置”？

What I already have?我已经拥有了什么？

r'^([0-9a-zA-Z]{2})([0-9a-zA-Z]{2})([0-9a-zA-Z]{6})'

I expect to get all four groups with one Regex expression.我希望所有四个组都具有一个 Regex 表达式。 How to expand my expression to get additionally group 4?如何扩展我的表达以获得额外的第 4 组？

Edit: Additionally following Regex is needed for a string of 72 and more characters编辑：对于 72 个或更多字符的字符串，还需要遵循正则表达式

I want to catch following characters in groups:我想在组中捕获以下字符：

Group 1: Character position "1 and 2"第 1 组：字符 position“1 和 2”
Group 2: Character position "3 and 4"第 2 组：字符 position“3 和 4”
Group 3: Character position "5 and 6"...第 3 组：字符 position“5 和 6”...
Group 16: Character position "31 and 32"第 16 组：字符 position “31 和 32”
Group 17: Character position "33 - 40"第 17 组：字符 position “33 - 40”
Group 18: Character position "41 and 42"第 18 组：字符 position “41 和 42”
Group 19: Character position "33 - 40"第 19 组：字符 position “33 - 40”
Group 20: Character position "12 - Last position of string"第 20 组：字符 position “12 - 字符串的最后一个 position”

String (72 char): 294592522929354526532268626626426854242342362676256672666267626726672667字符串（72 个字符）：294592522929354526532268626626426854242342362676256672666267626726672667

r'^([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{8})([\da-zA-Z]{2})([\da-zA-Z]{8})'

Answer 1

You could use a positive lookahead:您可以使用积极的前瞻：

^([\da-zA-Z]{2})([\da-zA-Z]{2})(?=([\da-zA-Z]{6})).([\da-zA-Z].*)$

^ - start of line anchor ^ - 行锚点的开始
([\da-zA-Z]{2}) - first capture group, pos 1-2 ([\da-zA-Z]{2}) - 第一个捕获组，pos 1-2
([\da-zA-Z]{2}) - second capture group, pos 3-4 ([\da-zA-Z]{2}) - 第二个捕获组，pos 3-4
(?=([\da-zA-Z]{6})) - positive lookahead, third capture, pos 5-10 (?=([\da-zA-Z]{6})) - 正面前瞻，第三次捕获，pos 5-10
.([\da-zA-Z].*) - discard one character and capture the rest as forth capture, pos 6-end .([\da-zA-Z].*) - 丢弃一个字符并捕获 rest 作为第四个捕获，pos 6-end
$ - end of line anchor $ - 行尾锚点

Demo演示

Answer 2

Since it's an index/position issue, why not just using classical slicing with a tuple-comp ?既然这是一个索引/位置问题，为什么不直接使用带有tuple-comp的经典切片呢？

S = "abcdefghij12345"

g1, g2, g3, g4 = (S[i:j] for i, j in [(0, 2), (2, 4), (4, 10), (5, None)])

Output: Output：

ab          # <- group1 
cd          # <- group2
efghij      # <- group3
fghij12345  # <- group4

正则表达式捕获组（回顾？）

问题描述

2 个解决方案

解决方案1
0 2023-01-29 21:08:50

解决方案2
0 2023-01-29 21:15:29

正则表达式捕获组（回顾？）

问题描述

2 个解决方案

解决方案1 0 2023-01-29 21:08:50

解决方案2 0 2023-01-29 21:15:29

解决方案1
0 2023-01-29 21:08:50

解决方案2
0 2023-01-29 21:15:29