简体   繁体   中英

Extract email from string using Template Tookit

I'm guessing this is relatively simple, but I can't find the answer.

From a string such as '"John Doe" <email@example.com>' - how can I extract the email portion from it using Template Tookit ?

An example string to parse is this:

$VAR1 = { 
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => 'fezius@evrostroyserov.ru on behalf of Caroline <fezius@evrostroyserov.ru>',
    'bytes' => 13620,
    'pmail' => 'user@example.com',
    'sender' => 'sender@example.com',
    'subject' => 'Some Email Subject'
};

My code, based on @dave-cross help below where $VAR1 is the output of dumper.dump(item.from)

[% text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
<td>[% matches.1 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

However, it's still not matching against $VAR1

There's a very old (and unmaintained) module, Template::Extract , that let's you define a template, then work backward from a string that might have been produced by that template:

use Template::Extract;
use Data::Dumper;

my $obj = Template::Extract->new;
my $template = qq("[% name %]" <[% email %]>);

my $string = '"John Doe" <email@example.com>';

my $extracted = $obj->extract($template, $string);

print Dumper( $extracted );

The output is:

$VAR1 = {
          'email' => 'email@example.com',
          'name' => 'John Doe'
        };

However, there are modules that already do this job for you and will handle many more situations

This does what you want, but it's pretty fragile and this really isn't the kind of thing that you should be doing in TT code. You should either get the data parsed outside of the template and passed into variables, or you should pass in a parsing subroutine that can be called from inside the template.

But, having given you the caveats, if you still insist this is what you want to do, then this is how you might do it:

In test.tt :

[% text = '"John Doe" <email@example.com>';
   matches = text.match('"(.*?)"\s+<(.*?)>');
   IF matches -%]
Name: [% matches.0 %]
Email: [% matches.1 %]
[% ELSE -%]
No match found
[% END -%]

Then, testing using tpage :

$ tpage test.tt
Name: John Doe
Email: email@example.com

But I cannot emphasise enough that you should not be doing it like this.

Update: I've used this test template to investigate your further problem.

[% item = { from => '"John Doe" <email@example.com>' };
   text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
<td>[% matches.1 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

And running it, I get this:

$ tpage test2.tt
<td> </td>

That's what I'd expect to see for a match. You're printing matches.1 . That's the second item from the matches array. And the second match group is (\s) . So I'm getting the space between the name and the opening angle bracket.

You probably don't want that whitespace match in your matches array, so I'd remove the parentheses around it, to make the regex (.*?)\s*<(.*?)> (note that \s* is a simpler way to say "zero or more whitespace characters").

You can now use matches.0 to get the name and matches.1 to get the email address.

Oh, and there's no need to copy items.from into text . You can call the matches vmethod on any scalar variable, so it's probably simpler to just use:

[% matches = item.from.match(...) -%]

Did I mention that this is all a really terrible idea? :-)

Update2 :

This is all going to be far easier if you give me complete, runnable code examples in the same way that I am doing for you. Any time I have to edit something in order to get an example running, we run the risk that I'm guessing incorrectly how your code works.

But, bearing that in mind, here's my latest test template:

[% item = {
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => 'fezius@evrostroyserov.ru on behalf of Caroline <fezius@evrostroyserov.ru>',
    'bytes' => 13620,
    'pmail' => 'user@example.com',
    'sender' => 'sender@example.com',
    'subject' => 'Some Email Subject'
};
   text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?<(.*?)>')) -%]
<td>[% matches.2 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

I've changed the definition of item to have your full example. I've left the regex as it was before my suggestions. And (because I haven't changed the regex) I've changed the output to print matches.2 instead of matches.1.

And here's what happens:

$ tpage test3.tt
<td>fezius@evrostroyserov.ru</td>

So it works.

If yours doesn't work, then you need to identify the differences between my (working) code and your (non-working) code. I'm happy to help you identify those differences, but you have to give my your non-working example in order for me to do that.

Update3 :

Again I've tried to incorporate the changes that you're talking about. But again, I've had to guess at stuff because you're not sharing complete runnable examples. And again, my code works as expected.

[% USE dumper -%]
[% item = {
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => 'fezius@evrostroyserov.ru on behalf of Caroline <fezius@evrostroyserov.ru>',
    'bytes' => 13620,
    'pmail' => 'user@example.com',
    'sender' => 'sender@example.com',
    'subject' => 'Some Email Subject'
};
 -%]
[% matches = item.from.match('(.*?)(\s)?<(.*?)>') -%]
[% dumper.dump(matches) %]

And testing it:

$ tpage test4.tt
$VAR1 = [
          'fezius@evrostroyserov.ru on behalf of Caroline',
          ' ',
          'fezius@evrostroyserov.ru'
        ];

So that works. If you want any more help, then send a complete runnable example. If you don't do that, I won't be able to help you any more.

I have no idea how Template Toolkit can help you. Use Email::Address or Email::Address::XS to parse an e-mail address.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM