简体   繁体   English

正则表达式捕获组(回顾?)

[英]Regex capture groups (lookbehind?)

I have a string, which can contain 10 or more characters ([0-9a-zA-Z]), eg: abcdefghij12345我有一个字符串,可以包含10个或更多字符([0-9a-zA-Z]),例如:abcdefghij12345

I want to catch following characters in groups:我想在组中捕获以下字符:

  • Group 1: Character position "1 and 2": ab第 1 组:字符 position “1 和 2”:ab
  • Group 2: Character position "3 and 4": cd第 2 组:字符 position “3 和 4”:cd
  • Group 3: Character position "5 - 10": efghij第 3 组:字符 position“5 - 10”:efghij
  • Group 4: Character position "6 - Last position of string": fghij12345第 4 组:字符 position“6 - 字符串的最后一个 position”:fghij12345

Group 1-3 works, but how can a get postion "6 - last postion of string" in Group 4?第 1-3 组有效,但如何在第 4 组中获得位置“6 - 字符串的最后位置”?

What I already have?我已经拥有了什么?

r'^([0-9a-zA-Z]{2})([0-9a-zA-Z]{2})([0-9a-zA-Z]{6})'

I expect to get all four groups with one Regex expression.我希望所有四个组都具有一个 Regex 表达式。 How to expand my expression to get additionally group 4?如何扩展我的表达以获得额外的第 4 组?

Edit: Additionally following Regex is needed for a string of 72 and more characters编辑:对于 72 个或更多字符的字符串,还需要遵循正则表达式

I want to catch following characters in groups:我想在组中捕获以下字符:

  • Group 1: Character position "1 and 2"第 1 组:字符 position“1 和 2”

  • Group 2: Character position "3 and 4"第 2 组:字符 position“3 和 4”

  • Group 3: Character position "5 and 6"...第 3 组:字符 position“5 和 6”...

  • Group 16: Character position "31 and 32"第 16 组:字符 position “31 和 32”

  • Group 17: Character position "33 - 40"第 17 组:字符 position “33 - 40”

  • Group 18: Character position "41 and 42"第 18 组:字符 position “41 和 42”

  • Group 19: Character position "33 - 40"第 19 组:字符 position “33 - 40”

  • Group 20: Character position "12 - Last position of string"第 20 组:字符 position “12 - 字符串的最后一个 position”

String (72 char): 294592522929354526532268626626426854242342362676256672666267626726672667字符串(72 个字符):294592522929354526532268626626426854242342362676256672666267626726672667

r'^([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{2})([\da-zA-Z]{8})([\da-zA-Z]{2})([\da-zA-Z]{8})'

You could use a positive lookahead:您可以使用积极的前瞻:

^([\da-zA-Z]{2})([\da-zA-Z]{2})(?=([\da-zA-Z]{6})).([\da-zA-Z].*)$
  • ^ - start of line anchor ^ - 行锚点的开始
  • ([\da-zA-Z]{2}) - first capture group, pos 1-2 ([\da-zA-Z]{2}) - 第一个捕获组,pos 1-2
  • ([\da-zA-Z]{2}) - second capture group, pos 3-4 ([\da-zA-Z]{2}) - 第二个捕获组,pos 3-4
  • (?=([\da-zA-Z]{6})) - positive lookahead, third capture, pos 5-10 (?=([\da-zA-Z]{6})) - 正面前瞻,第三次捕获,pos 5-10
  • .([\da-zA-Z].*) - discard one character and capture the rest as forth capture, pos 6-end .([\da-zA-Z].*) - 丢弃一个字符并捕获 rest 作为第四个捕获,pos 6-end
  • $ - end of line anchor $ - 行尾锚点

Demo演示

Since it's an index/position issue, why not just using classical slicing with a tuple-comp ?既然这是一个索引/位置问题,为什么不直接使用带有tuple-comp的经典切片呢?

S = "abcdefghij12345"

g1, g2, g3, g4 = (S[i:j] for i, j in [(0, 2), (2, 4), (4, 10), (5, None)])

Output: Output:

ab          # <- group1 
cd          # <- group2
efghij      # <- group3
fghij12345  # <- group4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM