简体   繁体   English

如何在字符串“ On x John write:”之后剥离所有内容

[英]How to strip everything after the pattern “On x John wrote:” in a string

I have a large string (coming from reading an email). 我有一个很大的字符串(来自阅读电子邮件)。 Now when an user replies, a typical reply looks as follows: 现在,当用户答复时,典型的答复如下所示:

"On x x x wrote:"

I would like to strip out all the text that comes after this pattern. 我想删除此模式之后的所有文本。 However, I am not sure how to identify this pattern. 但是,我不确定如何识别此模式。

I know how to strip out everything after a certain word or character: 我知道如何在特定字词或字符后删除所有内容:

abc = abc.split('From:', 1)[0]

But what do you do when you have text in between the patterns On and wrote: ? 但是,当在“ On和“ wrote: ”模式之间插入文本时,该怎么办?

Sample: 样品:

\r\nOn Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:\r\n

a regex will sort this: 正则表达式将对此进行排序:

re.match(r"\r\nOn.+wrote:", email)[0]

^ indicates start of string ^表示字符串的开头
On is the word "On" On这个词“开”
.+ is one or more instances of anything .+是任何事物的一个或多个实例
wrote: is the word "wrote" wrote:是单词“写”

the [0] at the end will get the first match from the email the email.strip() removes whitespace 最后的[0]将从电子邮件中的第一个匹配邮件中删除email.strip()删除空格

example: 例:

import re

email =  '\r\nOn Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:\r\n'
extracted = re.match(r"On.+wrote:", email.replace('\r', '').replace('\n', ''))[0]
print(extracted)

Out[163]: 'On Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:' 

an alternative to a regex, is to find the index of the first occurrence of the word "On", and the index of the word "wrote", and subset the whole text between those: 正则表达式的另一种选择是找到单词“ On”首次出现的索引和单词“ wrote”的索引,然后将整个文本分为以下两个子集:

extracted = email[email.find('On'):email[email.find('On'):].find('wrote:')+8]

abc.split("on.*wrote:")[1] abc.split(“ on。* wrote:”)[1]

https://regexr.com this is a great site to learn regex! https://regexr.com这是一个学习正则表达式的好网站!

You can use a regex, and then when you have the exact match can split it. 您可以使用正则表达式,然后在完全匹配时将其拆分。 You can use /On/regex/From:/, where regex is a regular expression to detect "xxx". 您可以使用/ On / regex / From:/,其中regex是检测“ xxx”的正则表达式。

More info can be found in the docs: enter link description here 可以在文档中找到更多信息: 在此处输入链接描述

您可以使用以下正则表达式查找模式:

 /(?:On\ x\ x\ x\ wrote\:)/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM