简体   繁体   English

正则表达式模式匹配电子邮件内联回复标题

[英]Regex pattern to match emails inline reply heading

I'm having a little trouble figuring out the pattern to identify the beginning of inline replies/forwards in an email body, there are some easier ones that simply begin with something like "Begin forwarded message" but the replies are a little more complicated: 我在确定模式以确定电子邮件正文中的内联回复/转发的开头时遇到了一些麻烦,有一些更简单的问题只是从“开始转发的邮件”开始,但回复有点复杂:

On 12-06-13 10:56 AM, "John Doe" <john.doe@some.tld> wrote:

Obviously the constants will be "On" and "wrote:". 显然,常量将是“On”和“write:”。 I'd like to be able to find only the first match and then either wrap everything after it in a div with display:none applied or even just eliminate it using substr($body,0, POSITION_OF_MATCH). 我希望能够只找到第一个匹配,然后在显示:无应用的div中将所有内容包装在后面,甚至只使用substr($ body,0,POSITION_OF_MATCH)消除它。

One of the issues I'm having is that it's not catching the FIRST occurrence, and second is that I can't get the greediness to work properly. 我遇到的一个问题是,它没有抓住第一次发生,其次是我不能让贪婪得以正常工作。

My progress (having fallen back to at least a partially working version) so far is: 到目前为止,我的进展(至少退回到部分工作版本)是:

preg_match("/On [^>]* wrote:/i",$content,$matches,PREG_OFFSET_CAPTURE);

Any help would be greatly appreciated! 任何帮助将不胜感激!

I wonder how your current version works at all, because you cannot possibly match the closing > . 我想知道你当前的版本是如何工作的,因为你不可能匹配结束> But you could do something like this: 但你可以这样做:

$content = preg_replace('/(On [^>]*> wrote:).*$/s', '$1', $content);

Which will match the first On ... wrote: and everything after that up until the end of the string. 哪个匹配第一个On ... wrote:以及之后的所有内容直到字符串结束。 And replace it by just the On ... wrote: . 并且只用On ... wrote:替换它。

You can probably break this down by elements; 你可以用元素打破这个; so you basically have: 所以你基本上有:

On DATE, "NAME" <EMAIL> wrote:

You can then characterize DATE , NAME , and EMAIL . 然后,您可以表征DATENAMEEMAIL

  • DATE is composed of numbers, dashes, spaces, colons, and letters. DATE由数字,短划线,空格,冒号和字母组成。 However, it ends with a comma, so you can use that instead. 但是,它以逗号结尾,因此您可以使用它。
  • NAME is composed of letters and spaces, though it is delimited by quotes, and you can probably handle that. NAME由字母和空格组成,虽然它用引号分隔,你可以处理它。
  • EMAIL is a bit more complicated, but emails cannot contain the character > , so you should be able to capture everything but that. EMAIL有点复杂,但电子邮件不能包含字符> ,因此您应该能够捕获除此之外的所有内容。

So you basically get: 所以你基本上得到:

On [anything but comma], "[anything but "]" <[anything but >]> wrote:

Which, in regex, is something like: 在正则表达式中,它是这样的:

/^On ([^,]+), \"([^\"]+)\" <([^>]+)> wrote:$/

Then, when using preg_match , you can get your matches from some $matches array, indices 1 through 3. 然后,当使用preg_match ,您可以从一些$matches数组获得$matches ,索引1到3。

I suggest 我建议

$email = preg_match('/^On [^"]*"[^"]*" <([^>]*)> wrote:$/', $str, $re) ? $re[1] : '';

See this demo . 这个演示

I appreciate the other answers, but none of them really took into account the many possible variations in the reply strings I was dealing with, that might have been my fault for not explaining properly or providing more options. 我很欣赏其他答案,但没有一个真正考虑到我正在处理的答复字符串中的许多可能的变化,这可能是我没有正确解释或提供更多选项的错。 I've +1'd everyone for their efforts though. 尽管如此,我已经为每个人的努力+1了。

The final solution which seems to be working best after a day of fiddling with it on and off is this: 在打开和关闭一天之后,最终解决方案看起来效果最好的是:

/On (Mon|Tue|Wed|Thu|Fri|Sat|Sun|[[:digit:]]{1,2})(.*?) wrote:/i

The option list that it begins with covers a range of different reply types that start with "On Tue..." or "On 23..." or "On 1...", etc. ensuring that the greediness I was complaining about wasn't taking in too much from random "on" strings elsewhere, the (.*?) takes care of the rest of the name/email portion, finally following up with "wrote:" to finish it off. 它开头的选项列表涵盖了一系列不同的回复类型,以“On Tue ...”或“On 23 ......”或“On 1 ...”等开头,确保我贪婪的抱怨关于没有从其他地方的随机“on”字符串中获取过多的东西,(。*?)负责处理名称/电子邮件部分的其余部分,最后跟进“write:”以完成它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM