简体   繁体   English

python正则表达式获取所有内容,直到特定字符串

[英]python regex get everything until specific strings

I have the following string:我有以下字符串:

This is the most recent email of this thread

More text

From: a@a.com
Date: 13 August, 2018

More text...

From: a@a.com
Sent: Tuesday 23 July
To: b@b.com, c@c.com
Subject: Test

I need to extract everything until this string combination:我需要提取所有内容,直到这个字符串组合:

From: *
Sent: *
To: *
Subject: *

The * acts as a wildcard. *充当通配符。

So my result should be:所以我的结果应该是:

This is the most recent email of this thread

More text

From: a@a.com
Date: 13 August, 2018

More text...

I want to filter this with a regular expression but I am not able to figure it out.我想用正则表达式过滤它,但我无法弄清楚。 Any pointers?任何指针?

This is the regex pattern I tried in regex101 but it does not work in my python script for some reason: r"([\\w\\W\\n]+?)\\n((?:from:[^\\n]+)\\n+((?:\\s*sent:[^\\n]+)\\n+(?:\\s*to:[^\\n]+)\\n*(?:\\s*cc:[^\\n]+)*\\n*(?:\\s*bcc:[^\\n]+)*\\n*(?:\\s*subject:[^\\n]+)*))"这是我在 regex101 中尝试的正则表达式模式,但由于某种原因它在我的 python 脚本中不起作用: r"([\\w\\W\\n]+?)\\n((?:from:[^\\n]+)\\n+((?:\\s*sent:[^\\n]+)\\n+(?:\\s*to:[^\\n]+)\\n*(?:\\s*cc:[^\\n]+)*\\n*(?:\\s*bcc:[^\\n]+)*\\n*(?:\\s*subject:[^\\n]+)*))"

Thanks!谢谢!

You could try using re.findall with a positive lookahead.您可以尝试使用re.findall进行正向re.findall The approch here is to match everything from the start of the string up to, but not including, the block of text which should stop the match.这里的方法是匹配从字符串开始到(但不包括)应该停止匹配的文本块的所有内容。

inp = """This is the most recent email of this thread

More text

From: a@a.com
Date: 13 August, 2018

More text...

From: a@a.com
Sent: Tuesday 23 July
To: b@b.com, c@c.com
Subject: Test"""

stop_text = """From: a@a.com
Sent: Tuesday 23 July
To: b@b.com, c@c.com
Subject: Test"""
matches = re.findall(r'^.*?(?=' + stop_text + ')', inp, flags=re.DOTALL)
print(matches)

This prints:这打印:

['This is the most recent email of this thread\n\nMore text\n\nFrom: a@a.com\nDate: 13 August, 2018\n\nMore text...\n\n']

Considering the example you provided has the regex options gim , maybe you just need to enable the flag re.IGNORECASE ?考虑您提供的示例具有正则表达式选项gim ,也许您只需要启用标志re.IGNORECASE

text = """
This is the most recent email of this thread

More text

From: a@a.com
Date: 13 August, 2018

More text...

From: a@a.com
Sent: Tuesday 23 July
To: b@b.com, c@c.com
Subject: Test
"""
pattern = "([\w\W\n]+?)\n((?:from:[^\n]+)\n+((?:\s*sent:[^\n]+)\n+(?:\s*to:[^\n]+)\n*(?:\s*cc:[^\n]+)*\n*(?:\s*bcc:[^\n]+)*\n*(?:\s*subject:[^\n]+)*))"
print(re.findall(pattern, text, re.MULTILINE|re.IGNORECASE))

prints印刷

[('\nThis is the most recent email of this thread\n\nMore text\n\nFrom: a@a.com\nDate: 13 August, 2018\n\nMore text...\n', 'From: a@a.com\nSent: Tuesday 23 July\nTo: b@b.com, c@c.com\nSubject: Test', 'Sent: Tuesday 23 July\nTo: b@b.com, c@c.com\nSubject: Test')]

you can make it simple with grouping....您可以通过分组使其变得简单....

import re   
str = """This is the most recent email of this thread

More text

From: a@a.com
Date: 13 August, 2018

More text...

From: a@a.com
Sent: Tuesday 23 July
To: b@b.com, c@c.com
Subject: Test"""

x=re.match(r"""(.+?.+)
From:.+?
Sent:.+?
To: .+?,.+?
Subject:.+?.+""",str,flags=re.DOTALL|re.MULTILINE)
print(x.groups())

group will give...the following result...小组将给出...以下结果...

('This is the most recent email of this thread\n\nMore 
text\n\nFrom:a@a.com\nDate:13 August, 2018\n\nMore text...\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式获取所有内容直到字符串中的第一个点 - Python regex to get everything until the first dot in a string Python正则表达式获取所有内容,直到表达式为''(year)“ - Python regex to get everything until an expression like ''(year)" 匹配所有内容,直到可选字符串(Python正则表达式) - Match everything until optional string (Python regex) python 正则表达式 re.sub:删除模式之前或之后的所有内容,直到以两种方式找到特定条件 - python regex re.sub: remove everything before or after a pattern until find a specific condition in both ways 正则表达式:匹配所有内容,直到`:`或`(` - regex: match everything until `:` or `(` 正则表达式到FindAll字符串链,直到带有python的点 - Regex to FindAll chains of strings until a dot with python 如何使用 Python 正则表达式选择所有内容直到模式 - How to use Python regex to select everything until pattern 正则表达式-在第一个逗号之前获取所有内容-python - Regex - get everything before first comma - python Python正则表达式从列表中删除除字符串外的所有内容 - Python regex remove everything except strings from list python - 正则表达式仅在字符的第一次出现时替换 2 个字符串之间的所有内容 - python - regex replace everything between 2 strings only in the first occourence of a character
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM