简体   繁体   中英

How to get whole string until next regex pattern matches?

I've got the following code:

pat = re.compile(r'^(\d+\/\d+\/\d+,\s\d+:\d+\s\w+\s-\s)', re.S | re.M)
with open(r'C:\Users\usamahaider\Downloads\mmm.txt', encoding="utf8") as f:
    mylist = [m.group(1) for m in pat.finditer(f.read())]
print(mylist)

The output is:

['12/30/19, 8:57 AM - ', '12/3/19, 14:57 AM - ', '9/20/19, 8:52 AM - ', '12/3/19, 8:57 AM - ', '12/3/19, 9:34 PM - ', '12/3/19, 9:34 PM - ', '12/4/19, 6:45 AM - ', '12/4/19, 6:49 AM - ', '12/4/19, 7:12 AM - ', '12/4/19, 7:19 AM - ', '12/4/19, 7:20 AM - ', '12/4/19, 7:34 AM - ', '12/4/19, 8:00 AM - ', '12/4/19, 9:45 AM - ', '12/4/19, 10:15 AM - ', '12/4/19, 10:55 AM - ']

This is just returning me the patterns, but I want all the text associated with single pattern.

Something like this:

['12/30/19, 8:57 AM -Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more. ', '12/3/19, 14:57 AM - You joined using this group's invite link', '9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"']

The text file looks like this:

12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one 

outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
12/3/19, 14:57 AM - You joined using this group's invite link
9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"
12/3/19, 8:57 AM - You joined using this group's invite link

12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female
Height: 5’ 8”
Age: 21
Education: 1st Yr Medical School
Profession: Future Doctor
Marital status: Never married
Ethnicity: Pakistani
Religious background: Sunni
Family: Parents, Brothers, Sister
Language: English, Urdu
Hobbies: Travel, Art, Reading

LOOKING FOR: 
Age : 24-29
Height: 5’ 10” or taller
Religion: Sunni Muslim 
Education: MD/DO
Profession: Doctor/ Medical Residency/Medical Student 
Marital Status: Never married 

Contact: Mother
WhatsApp: (647) 879-1400
12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>
12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group
12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen

Use

re.split(r'^(?=\d+/\d+/\d+,\s\d+:\d+\s+\w+\s+-\s)', string, flags=re.M)

See proof .

Python proof :

import re
string = """12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one \n\noutside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.\n12/3/19, 14:57 AM - You joined using this group's invite link\n9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"\n12/3/19, 8:57 AM - You joined using this group's invite link\n\n12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female\nHeight: 5’ 8”\nAge: 21\nEducation: 1st Yr Medical School\nProfession: Future Doctor\nMarital status: Never married\nEthnicity: Pakistani\nReligious background: Sunni\nFamily: Parents, Brothers, Sister\nLanguage: English, Urdu\nHobbies: Travel, Art, Reading\n\nLOOKING FOR: \nAge : 24-29\nHeight: 5’ 10” or taller\nReligion: Sunni Muslim \nEducation: MD/DO\nProfession: Doctor/ Medical Residency/Medical Student \nMarital Status: Never married \n\nContact: Mother\nWhatsApp: (647) 879-1400\n12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>\n12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group\n12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen"""
results = list(filter(None, re.split(r'^(?=\d+/\d+/\d+,\s\d+:\d+\s+\w+\s+-\s)', string, flags=re.M)))
for line in results: print('====',line.strip())

Result :

==== 12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one 

outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
==== 12/3/19, 14:57 AM - You joined using this group's invite link
==== 9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"
==== 12/3/19, 8:57 AM - You joined using this group's invite link
==== 12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female
Height: 5’ 8”
Age: 21
Education: 1st Yr Medical School
Profession: Future Doctor
Marital status: Never married
Ethnicity: Pakistani
Religious background: Sunni
Family: Parents, Brothers, Sister
Language: English, Urdu
Hobbies: Travel, Art, Reading

LOOKING FOR: 
Age : 24-29
Height: 5’ 10” or taller
Religion: Sunni Muslim 
Education: MD/DO
Profession: Doctor/ Medical Residency/Medical Student 
Marital Status: Never married 

Contact: Mother
WhatsApp: (647) 879-1400
==== 12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>
==== 12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group
==== 12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM