正则表达式提取电子邮件地址

Question

I want to be able to extract an email address embedded in tags eg <email> test@demo.com </email> where the src is as <email>test@demo.com</email> 我希望能够提取嵌入在标记中的电子邮件地址，例如<email> test@demo.com </email> ，其中src为<email>test@demo.com</email>

My expression I use is as follows: (?<=email>).*(?=<)/i) . 我使用的表达式如下： (?<=email>).*(?=<)/i) 。 This works well. 这很好。 However, if the email is a hyperlink ie <email>**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> </email> 但是，如果电子邮件是超链接，即<email>**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> </email> then i can no longer extract the extact email address. 那么我将无法再提取确切的电子邮件地址。 i get the following: <a href="mailto:test@demo.com">test@demo.com</a> instead of test@demo.com . 我得到以下信息： <a href="mailto:test@demo.com">test@demo.com</a>而不是test@demo.com 。 I have tried (?<=a href="mailto:).*(?="target="_blank")/i) but nothing is returned. 我已经尝试过(?<=a href="mailto:).*(?="target="_blank")/i)但是什么也没有返回。 Any ideas on how to extract the email when the hyperlink is there? 关于超链接存在时如何提取电子邮件的任何想法？

Answer 1

Web dev 101: don't parse HTML with regex, use DOM manipulations instead. Web开发人员101：请勿使用正则表达式解析HTML，而应使用DOM操作。

This below logs all the emails, whether they are inside plain email tags or a inside email tags or any nesting of tags. 这下面记录了所有电子邮件，无论它们是在普通email标签内还是a inside email标签内或任何嵌套的标签中。

 console.log( Array.from(document.getElementsByTagName('email')) .map(elt => elt.textContent) .map(email => email.trim()) )

 <email>john@doe.com</email> <email><a href="mailto:john@doe.com">john@doe.com</a></email> <email><b><a href="mailto:john@doe.com">john@doe.com</a></b></email> <email><span><b><a href="mailto:john@doe.com">john@doe.com</a></b></span></email> <email>"o'brian"@irish.com</email>

The .trim() is useful in case there is whitespace in the HTML which can show up around the email. .trim()很有用，以防HTML中出现空白，该空白可以显示在电子邮件周围。

Answer 2

You can parse each line of Dom and match email regex with tag content, like below snippet : 您可以解析Dom的每一行，并将电子邮件正则表达式与标签内容进行匹配，如以下代码段所示：

<script>
function getEmailsFromText (text)
{
    return text.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
var items = document.getElementsByTagName("*");
    for (var i = 0; i < items.length; i++) {
        var text = items.item(i).textContent;
        var emailIds = getEmailsFromText(text);
        if(emailIds){
        console.log("Emails ID's : "+emailIds);
        }
    }
</script>

To test, open your javascript console tab and paste the above code which inside script tag and you can see all email id's of current html page. 要进行测试，请打开您的javascript控制台标签，并将上面的代码粘贴到script标签内，您可以查看当前html页面的所有电子邮件ID。

正则表达式提取电子邮件地址

问题描述

2 个解决方案

解决方案1
1 2018-11-09 12:30:42

解决方案2
0 2018-11-09 12:41:35

正则表达式提取电子邮件地址

问题描述

2 个解决方案

解决方案1 1 2018-11-09 12:30:42

解决方案2 0 2018-11-09 12:41:35

解决方案1
1 2018-11-09 12:30:42

解决方案2
0 2018-11-09 12:41:35