如何在字符串“ On x John write：”之后剥离所有内容

Question

I have a large string (coming from reading an email). 我有一个很大的字符串（来自阅读电子邮件）。 Now when an user replies, a typical reply looks as follows: 现在，当用户答复时，典型的答复如下所示：

"On x x x wrote:"

I would like to strip out all the text that comes after this pattern. 我想删除此模式之后的所有文本。 However, I am not sure how to identify this pattern. 但是，我不确定如何识别此模式。

I know how to strip out everything after a certain word or character: 我知道如何在特定字词或字符后删除所有内容：

abc = abc.split('From:', 1)[0]

But what do you do when you have text in between the patterns On and wrote: ? 但是，当在“ On和“ wrote: ”模式之间插入文本时，该怎么办？

Sample: 样品：

\r\nOn Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:\r\n

Answer 1

a regex will sort this: 正则表达式将对此进行排序：

re.match(r"\r\nOn.+wrote:", email)[0]

^ indicates start of string ^表示字符串的开头
On is the word "On" On这个词“开”
.+ is one or more instances of anything .+是任何事物的一个或多个实例
wrote: is the word "wrote" wrote:是单词“写”

the [0] at the end will get the first match from the email the email.strip() removes whitespace 最后的[0]将从电子邮件中的第一个匹配邮件中删除email.strip()删除空格

example: 例：

import re

email =  '\r\nOn Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:\r\n'
extracted = re.match(r"On.+wrote:", email.replace('\r', '').replace('\n', ''))[0]
print(extracted)

Out[163]: 'On Tue, Feb 12, 2019 at 1:11 PM +0100, "Name" <email@email.com<mailto:email@email.com>> wrote:'

an alternative to a regex, is to find the index of the first occurrence of the word "On", and the index of the word "wrote", and subset the whole text between those: 正则表达式的另一种选择是找到单词“ On”首次出现的索引和单词“ wrote”的索引，然后将整个文本分为以下两个子集：

extracted = email[email.find('On'):email[email.find('On'):].find('wrote:')+8]

Answer 2

abc.split("on.*wrote:")[1] abc.split（“ on。* wrote：”）[1]

https://regexr.com this is a great site to learn regex! https://regexr.com这是一个学习正则表达式的好网站！

Answer 3

You can use a regex, and then when you have the exact match can split it. 您可以使用正则表达式，然后在完全匹配时将其拆分。 You can use /On/regex/From:/, where regex is a regular expression to detect "xxx". 您可以使用/ On / regex / From：/，其中regex是检测“ xxx”的正则表达式。

More info can be found in the docs: enter link description here 可以在文档中找到更多信息：在此处输入链接描述

Answer 4

您可以使用以下正则表达式查找模式：

 /(?:On\ x\ x\ x\ wrote\:)/

如何在字符串“ On x John write：”之后剥离所有内容

问题描述

4 个解决方案

解决方案1
3 已采纳 2019-02-13 10:53:32

解决方案2
1 2019-02-13 10:56:34

解决方案3
0 2019-02-13 10:52:21

解决方案4
0 2019-02-13 10:56:18

如何在字符串“ On x John write：”之后剥离所有内容

问题描述

4 个解决方案

解决方案1 3 已采纳 2019-02-13 10:53:32

解决方案2 1 2019-02-13 10:56:34

解决方案3 0 2019-02-13 10:52:21

解决方案4 0 2019-02-13 10:56:18

解决方案1
3 已采纳 2019-02-13 10:53:32

解决方案2
1 2019-02-13 10:56:34

解决方案3
0 2019-02-13 10:52:21

解决方案4
0 2019-02-13 10:56:18