I have a strings taken from Linux mail logs that look something like :
May 20 12:19:28 example-03 amavis[1445]: (01445-15) Passed SPAMMY {RelayedTaggedInbound}, [10.4.3.2]:49488 [10.4.3.2] <offers-john=example.com@example.net> -> <john@example.com>, Queue-ID: C00OZs0w9DB, Message-ID: <5ZCfDBMQyiUjOVD78ZFxg5%3D%3D@example.net>, mail_id: aCUpU0wtUaR, Hits: 15.587, size: 21407, queued_as: dgzikuucQ9i, 438 ms
The element I need to extract is :
<offers-john=example.com@example.net> -> <john@example.com>
I want to keep my regex as simple and clear as possible, so I don't want to go into regex for email address formats. Not least because regexing email formats is a bug-prone process !
I have tried :
$row =~ /(<.*> -> <.*>,)/;
But, despite the presence of the comma delimiter, that syntax matches all the way to the end of the end of Message-ID with an output such as :
<offers-john=example.com@example.net> -> <john@example.com>, Queue-ID: C00OZs0w9DB, Message-ID: <5ZCfDBMQyiUjOVD78ZFxg5%3D%3D@example.net>,
By default the quantifier *
is greedy. It matches as much as it can, you need to make it lazy (aka non-greedy) by adding a ?
after it. Here is an example .
That is much more robustly written without the non-greedy option, and it is clearer if insignificant whitespace is added with the help of the /x
modifier. Like so
$row =~ / ( <[^<>]*> \s* -> \s* <[^<>]*> ) /x;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.