简体   繁体   English

正则表达式不起作用

[英]Regular expression not working

I´m trying to get the email from and cc from a forwarded email, when the body looks like this: 当正文如下时,我正在尝试从转发的电子邮件中获取和抄送该电子邮件:

$body = '-------
Begin forwarded message:


From: Sarah Johnson <blabla@gmail.com>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <thatwouldbe@yayyy.com>

Cc: Ralph Johnson <johnson@gmail.com>


Hi,


hello, thank you and goodbye!

 blabla@gmail.com'

Now, when I do the following: 现在,当我执行以下操作时:

$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

I correctly get: 我正确地得到:

from: sarah johnson <blabla@gmail.com>

Now, why does the cc don't work? 现在,为什么抄送不起作用? I do something very similar, only changing from to cc: 我做的事情很相似,只是从改为cc:

$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

and I get: 我得到:

cc: ralph johnson <johnson@gmail.com> hi, hello, thank you and goodbye! blabla@gmail.com

If I remove the email from the original body footer (removing blabla@gmail.com) then I correctly get: 如果我从原始页脚中删除了电子邮件(删除blabla@gmail.com),那么我正确地得到了:

cc: ralph johnson <johnson@gmail.com>

It looks like that email is affecting the regular expression. 电子邮件似乎正在影响正则表达式。 But how, and why doesn't it affect it in the from? 但是,它如何以及为什么不从头开始影响它呢? How can I fix this? 我怎样才能解决这个问题?

The problem is, that \\D* matches too much, ie it is also matching newline characters. 问题是\\D*匹配太多,即它也匹配换行符。 I would be more restrictive here. 我在这里会更加严格。 Why do you use \\D (not a Digit) at all? 为什么要完全使用\\D (而不是数字)?

With eg [^@]* it is working 使用例如[^@]*可以正常工作

cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S

See it here on Regexr . 在Regexr上看到它。

This way, you are sure that this first part is not matching beyond the email address. 这样,您可以确保第一部分与电子邮件地址不匹配。

This \\D is also the reason, it is working for the first, the "From" case. \\D也是原因,它适用于第一个“发件人”案例。 There are digits in the "Date" row, therefore it does not match over this row. “日期”行中有数字,因此在该行中不匹配。

Try like this 这样尝试

$body = '-------
Begin forwarded message:


From: Sarah Johnson <blabla@gmail.com>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <thatwouldbe@yayyy.com>

Cc: Ralph Johnson <johnson@gmail.com>


Hi,


hello, thank you and goodbye!

 blabla@gmail.com';

$pattern = '#(?:from|Cc):\s+[^<>]+<([^@]+@[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';

Output 产量

Array
(
    [0] => Array
        (
            [0] => From: Sarah Johnson <blabla@gmail.com>
            [1] => Cc: Ralph Johnson <johnson@gmail.com>
        )

    [1] => Array
        (
            [0] => blabla@gmail.com
            [1] => johnson@gmail.com
        )

)

$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM