如何在python中使用正则表达式匹配部分字符串？

Question

Here's the string :这是字符串：

SCOPE OF WORK: Supply &  Flensburg House, MMDA Colony,     PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
  xxxxxx

The things that will change in the string are:将在字符串中更改的内容是：

Flensburg House, MMDA Colony,

and和

Arumbakkam,Chennai,Tamil Nadu,

And these parts of the strings can contain alphabets , numbers , commas,#,- and _字符串的这些部分可以包含字母、数字、逗号、#、- 和 _

The remaining parts of the string will remain as it is, including spacings.字符串的其余部分将保持原样，包括间距。

Here's the regex I am using这是我正在使用的正则表达式

SCOPE OF WORK: Supply &  [A-Za-z,\s]]*PAN#: [A-Z]{5}[0-9]{4}[A-Z]{1}\n    installation [A-Za-z]\n      xxxxxx

Ultimately what I need to obtain is:最终我需要获得的是：

Flensburg House, MMDA Colony,     
installation Arumbakkam,Chennai,Tamil Nadu,

I don't think my regex is entirely right and I need help on how to go about this.我不认为我的正则表达式是完全正确的，我需要帮助来解决这个问题。

Answer 1

A few things I noticed about your current pattern:关于您当前的模式，我注意到了一些事情：

You are trying to match more space characters than pressent in text;您正在尝试匹配比文本中更多的空格字符；
Your character classes for both substrings differ.两个子字符串的字符类不同。 There is spaces and comma missing from the 2nd one which is also only matched once.第二个缺少空格和逗号，也只匹配一次。 + Both are missing the # symbol and digits currently; + 目前两者都缺少#符号和数字；

Assuming you need to just get these two substring in groups (excluding the trailing comma), try:假设您只需要在组中获取这两个子字符串（不包括尾随逗号），请尝试：

^SCOPE OF WORK: Supply &  ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$

See an online demo查看在线演示

^ - Start-line anchor; ^ - 起跑线锚；
SCOPE OF WORK: Supply & - A literal match of this substring including the two trailing spaces; SCOPE OF WORK: Supply & - 此子字符串的文字匹配，包括两个尾随空格；
([\w, #-]+) - A 1st capture group to match 1+ characters from given class where \w is shorthand for [A-Za-z0-9_] , all characters you mentioned it needs to include; ([\w, #-]+) - 第一个捕获组，用于匹配给定类中的 1+ 个字符，其中\w是[A-Za-z0-9_]简写，您提到的所有字符都需要包括在内；
,\s+PAN#: - A literal match of this substring including the trailing comma and 1+ whitespace characters; ,\s+PAN#: - 此子字符串的文字匹配，包括结尾的逗号和 1+ 个空格字符；
[AZ]{5}[0-9]{4}[AZ] - Verification what follows is 5 uppercase letter, 4 digits and a single uppercase (no need to quantify a single character); [AZ]{5}[0-9]{4}[AZ] - 验证后面是 5 个大写字母，4 个数字和一个大写字母（无需量化单个字符）；
\s+installation - 1+ Whitespace characters including newline and trailing spaces upto; \s+installation - 1+ 个空格字符，包括换行符和尾随空格；
([\w, #-]+) - A 2nd capture group to match the same pattern as 1st group; ([\w, #-]+) - 第二个捕获组与第一个组匹配相同的模式；
,\s+x{6} - Match the trailing comma, 1+ whitespace characters and 6 trailing x's; ,\s+x{6} - 匹配尾随逗号、1+ 空格字符和 6 个尾随 x；
$ - End-line anchor. $ - 结束线锚。

import re

s = """SCOPE OF WORK: Supply &  Flensburg House, MMDA Colony,     PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
  xxxxxx"""
  
l = re.findall(r'^SCOPE OF WORK: Supply &  ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$', s)

print(l)

Prints:印刷：

[('Flensburg House, MMDA Colony', 'Arumbakkam,Chennai,Tamil Nadu')]

如何在python中使用正则表达式匹配部分字符串？

问题描述

1 个解决方案

解决方案1
0 2022-07-21 10:05:34

如何在python中使用正则表达式匹配部分字符串？

问题描述

1 个解决方案

解决方案1 0 2022-07-21 10:05:34

解决方案1
0 2022-07-21 10:05:34