简体   繁体   中英

Regular expression not working

I´m trying to get the email from and cc from a forwarded email, when the body looks like this:

$body = '-------
Begin forwarded message:


From: Sarah Johnson <blabla@gmail.com>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <thatwouldbe@yayyy.com>

Cc: Ralph Johnson <johnson@gmail.com>


Hi,


hello, thank you and goodbye!

 blabla@gmail.com'

Now, when I do the following:

$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

I correctly get:

from: sarah johnson <blabla@gmail.com>

Now, why does the cc don't work? I do something very similar, only changing from to cc:

$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

and I get:

cc: ralph johnson <johnson@gmail.com> hi, hello, thank you and goodbye! blabla@gmail.com

If I remove the email from the original body footer (removing blabla@gmail.com) then I correctly get:

cc: ralph johnson <johnson@gmail.com>

It looks like that email is affecting the regular expression. But how, and why doesn't it affect it in the from? How can I fix this?

The problem is, that \\D* matches too much, ie it is also matching newline characters. I would be more restrictive here. Why do you use \\D (not a Digit) at all?

With eg [^@]* it is working

cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S

See it here on Regexr .

This way, you are sure that this first part is not matching beyond the email address.

This \\D is also the reason, it is working for the first, the "From" case. There are digits in the "Date" row, therefore it does not match over this row.

Try like this

$body = '-------
Begin forwarded message:


From: Sarah Johnson <blabla@gmail.com>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <thatwouldbe@yayyy.com>

Cc: Ralph Johnson <johnson@gmail.com>


Hi,


hello, thank you and goodbye!

 blabla@gmail.com';

$pattern = '#(?:from|Cc):\s+[^<>]+<([^@]+@[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';

Output

Array
(
    [0] => Array
        (
            [0] => From: Sarah Johnson <blabla@gmail.com>
            [1] => Cc: Ralph Johnson <johnson@gmail.com>
        )

    [1] => Array
        (
            [0] => blabla@gmail.com
            [1] => johnson@gmail.com
        )

)

$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM