[英]How to match part of string using regex in python?
Here's the string :这是字符串:
SCOPE OF WORK: Supply & Flensburg House, MMDA Colony, PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
xxxxxx
The things that will change in the string are:将在字符串中更改的内容是:
Flensburg House, MMDA Colony,
and和
Arumbakkam,Chennai,Tamil Nadu,
And these parts of the strings can contain alphabets , numbers , commas,#,- and _字符串的这些部分可以包含字母、数字、逗号、#、- 和 _
The remaining parts of the string will remain as it is, including spacings.字符串的其余部分将保持原样,包括间距。
Here's the regex I am using这是我正在使用的正则表达式
SCOPE OF WORK: Supply & [A-Za-z,\s]]*PAN#: [A-Z]{5}[0-9]{4}[A-Z]{1}\n installation [A-Za-z]\n xxxxxx
Ultimately what I need to obtain is:最终我需要获得的是:
Flensburg House, MMDA Colony,
installation Arumbakkam,Chennai,Tamil Nadu,
I don't think my regex is entirely right and I need help on how to go about this.我不认为我的正则表达式是完全正确的,我需要帮助来解决这个问题。
A few things I noticed about your current pattern:关于您当前的模式,我注意到了一些事情:
#
symbol and digits currently; #
符号和数字; Assuming you need to just get these two substring in groups (excluding the trailing comma), try:假设您只需要在组中获取这两个子字符串(不包括尾随逗号),请尝试:
^SCOPE OF WORK: Supply & ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$
^
- Start-line anchor; ^
- 起跑线锚;SCOPE OF WORK: Supply &
- A literal match of this substring including the two trailing spaces; SCOPE OF WORK: Supply &
- 此子字符串的文字匹配,包括两个尾随空格;([\w, #-]+)
- A 1st capture group to match 1+ characters from given class where \w
is shorthand for [A-Za-z0-9_]
, all characters you mentioned it needs to include; ([\w, #-]+)
- 第一个捕获组,用于匹配给定类中的 1+ 个字符,其中\w
是[A-Za-z0-9_]
简写,您提到的所有字符都需要包括在内;,\s+PAN#:
- A literal match of this substring including the trailing comma and 1+ whitespace characters; ,\s+PAN#:
- 此子字符串的文字匹配,包括结尾的逗号和 1+ 个空格字符;[AZ]{5}[0-9]{4}[AZ]
- Verification what follows is 5 uppercase letter, 4 digits and a single uppercase (no need to quantify a single character); [AZ]{5}[0-9]{4}[AZ]
- 验证后面是 5 个大写字母,4 个数字和一个大写字母(无需量化单个字符);\s+installation
- 1+ Whitespace characters including newline and trailing spaces upto; \s+installation
- 1+ 个空格字符,包括换行符和尾随空格;([\w, #-]+)
- A 2nd capture group to match the same pattern as 1st group; ([\w, #-]+)
- 第二个捕获组与第一个组匹配相同的模式;,\s+x{6}
- Match the trailing comma, 1+ whitespace characters and 6 trailing x's; ,\s+x{6}
- 匹配尾随逗号、1+ 空格字符和 6 个尾随 x;$
- End-line anchor. $
- 结束线锚。import re
s = """SCOPE OF WORK: Supply & Flensburg House, MMDA Colony, PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
xxxxxx"""
l = re.findall(r'^SCOPE OF WORK: Supply & ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$', s)
print(l)
Prints:印刷:
[('Flensburg House, MMDA Colony', 'Arumbakkam,Chennai,Tamil Nadu')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.