正则表达式模式在某个 substring 之后查找 x 长度的 n 个非空格字符

Question

I am using this regex pattern pattern = r'cig[\s:.]*(\w{10})' to extract the 10 characters after the '''cig''' contained in each line of my dataframe.我正在使用这个正则表达式模式pattern = r'cig[\s:.]*(\w{10})'来提取 dataframe 每行中包含的 '''cig''' 之后的 10 个字符。 With this pattern I am accounting for all cases, except for the ones where that substring contains some spaces inside it.使用这种模式，我会考虑所有情况，除了 substring 内部包含一些空格的情况。

For example, I am trying to extract Z9F27D2198 from the string例如，我试图从字符串中提取Z9F27D2198

/BENEF/FORNITURA GAS FEB-20 CIG Z9F                 27D2198 01762-0000031

In the previous string, it seems like Stack overflow formatted it, but there should be 17 whitespaces between F and 2 , after CIG .在前面的字符串中，似乎是堆栈溢出对其进行了格式化，但在CIG之后的F和2之间应该有 17 个空格。

Could you help me to edit the regex pattern in order to account for the white spaces in that 10-characters substring?您能帮我编辑正则表达式模式以说明 10 个字符 substring 中的空格吗？ I am also using flags=re.I to ignore the case of the strings in my re.findall calls.我还使用flags=re.I来忽略re.findall调用中字符串的大小写。

To give an example string for which this pattern works:给出此模式适用的示例字符串：

CIG7826328A2B FORNITURA ENERGIA ELETTRICA U TENZE COMUNALI CONVENZIONE CONSIP E

and it outputs what I want: 7826328A2B .它输出我想要的： 7826328A2B 。

Thanks in advance.提前致谢。

Answer 1

You can use您可以使用

r'(?i)cig[\s:.]*(\S(?:\s*\S){9})(?!\S)'

See the regex demo .请参阅正则表达式演示。 Details :详情：

cig - a cig string cig - cig字符串
[\s:.]* - zero or more whitespaces, : or . [\s:.]* - 零个或多个空格， :或.
(\S(?:\s*\S){9}) - Group 1: a non-whitespace char and then nine occurrences of zero or more whitespaces followed with a non-whitespace char (\S(?:\s*\S){9}) - 第 1 组：一个非空白字符，然后出现九个零个或多个空白字符，后跟一个非空白字符
(?!\S) - immediately to the right, there must be a whitespace or end of string. (?!\S) - 紧靠右边，必须有空格或字符串结尾。

In Python, you can use在 Python 中，您可以使用

import re
text = "/BENEF/FORNITURA GAS FEB-20 CIG Z9F               27D2198 01762-0000031"
pattern = r'cig[\s:.]*(\S(?:\s*\S){9})(?!\S)'
matches = re.finditer(pattern, text, re.I)
for match in matches:
  print(re.sub(r'\s+', '', match.group(1)), ' found at ', match.span(1))

# => Z9F27D2198  found at  (32, 57)

See the Python demo .请参阅Python 演示。

Answer 2

What about:关于什么：

# removes all white spaces with replace()

x = 'CIG7826328A2B FORNITURA ENERGIA ELETTRICA U'.replace(' ', '')
x = x.split("CIG")[1][:10] 
# x = '7826328A2B'

x = '/BENEF/FORNITURA GAS FEB-20 CIG Z9F 27D2198 01762-0000031'.replace(' ', '')
x.split("CIG")[1][:10]
# x = '7826328A2B'

Works fine if there is only one "CIG" in the string如果字符串中只有一个“CIG”，则可以正常工作

正则表达式模式在某个 substring 之后查找 x 长度的 n 个非空格字符

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-03-23 16:57:25

解决方案2
0 2021-03-23 16:43:25

正则表达式模式在某个 substring 之后查找 x 长度的 n 个非空格字符

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-03-23 16:57:25

解决方案2 0 2021-03-23 16:43:25

解决方案1
1 已采纳 2021-03-23 16:57:25

解决方案2
0 2021-03-23 16:43:25