Here's the string :
SCOPE OF WORK: Supply & Flensburg House, MMDA Colony, PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
xxxxxx
The things that will change in the string are:
Flensburg House, MMDA Colony,
and
Arumbakkam,Chennai,Tamil Nadu,
And these parts of the strings can contain alphabets , numbers , commas,#,- and _
The remaining parts of the string will remain as it is, including spacings.
Here's the regex I am using
SCOPE OF WORK: Supply & [A-Za-z,\s]]*PAN#: [A-Z]{5}[0-9]{4}[A-Z]{1}\n installation [A-Za-z]\n xxxxxx
Ultimately what I need to obtain is:
Flensburg House, MMDA Colony,
installation Arumbakkam,Chennai,Tamil Nadu,
I don't think my regex is entirely right and I need help on how to go about this.
A few things I noticed about your current pattern:
#
symbol and digits currently; Assuming you need to just get these two substring in groups (excluding the trailing comma), try:
^SCOPE OF WORK: Supply & ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$
See an online demo
^
- Start-line anchor; SCOPE OF WORK: Supply &
- A literal match of this substring including the two trailing spaces; ([\w, #-]+)
- A 1st capture group to match 1+ characters from given class where \w
is shorthand for [A-Za-z0-9_]
, all characters you mentioned it needs to include; ,\s+PAN#:
- A literal match of this substring including the trailing comma and 1+ whitespace characters; [AZ]{5}[0-9]{4}[AZ]
- Verification what follows is 5 uppercase letter, 4 digits and a single uppercase (no need to quantify a single character); \s+installation
- 1+ Whitespace characters including newline and trailing spaces upto; ([\w, #-]+)
- A 2nd capture group to match the same pattern as 1st group; ,\s+x{6}
- Match the trailing comma, 1+ whitespace characters and 6 trailing x's; $
- End-line anchor. import re
s = """SCOPE OF WORK: Supply & Flensburg House, MMDA Colony, PAN#: AAYCS8310G
installation Arumbakkam,Chennai,Tamil Nadu,
xxxxxx"""
l = re.findall(r'^SCOPE OF WORK: Supply & ([\w, #-]+),\s+PAN#: [A-Z]{5}[0-9]{4}[A-Z]\s+installation ([\w, #-]+),\s+x{6}$', s)
print(l)
Prints:
[('Flensburg House, MMDA Colony', 'Arumbakkam,Chennai,Tamil Nadu')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.