简体   繁体   English

正则表达式提取电子邮件地址

[英]Regex to extract email address

I want to be able to extract an email address embedded in tags eg <email> test@demo.com </email> where the src is as &lt;email&gt;test@demo.com&lt;/email&gt; 我希望能够提取嵌入在标记中的电子邮件地址,例如<email> test@demo.com </email> ,其中src为&lt;email&gt;test@demo.com&lt;/email&gt;

My expression I use is as follows: (?<=email&gt;).*(?=&lt;)/i) . 我使用的表达式如下: (?<=email&gt;).*(?=&lt;)/i) This works well. 这很好。 However, if the email is a hyperlink ie &lt;email&gt;**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> &lt;/email&gt; 但是,如果电子邮件是超链接,即&lt;email&gt;**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> &lt;/email&gt; then i can no longer extract the extact email address. 那么我将无法再提取确切的电子邮件地址。 i get the following: <a href="mailto:test@demo.com">test@demo.com</a> instead of test@demo.com . 我得到以下信息: <a href="mailto:test@demo.com">test@demo.com</a>而不是test@demo.com I have tried (?<=a href="mailto:).*(?="target="_blank")/i) but nothing is returned. 我已经尝试过(?<=a href="mailto:).*(?="target="_blank")/i)但是什么也没有返回。 Any ideas on how to extract the email when the hyperlink is there? 关于超链接存在时如何提取电子邮件的任何想法?

Web dev 101: don't parse HTML with regex, use DOM manipulations instead. Web开发人员101:请勿使用正则表达式解析HTML,而应使用DOM操作。

This below logs all the emails, whether they are inside plain email tags or a inside email tags or any nesting of tags. 这下面记录了所有电子邮件,无论它们是在普通email标签内还是a inside email标签内或任何嵌套的标签中。

 console.log( Array.from(document.getElementsByTagName('email')) .map(elt => elt.textContent) .map(email => email.trim()) ) 
 <email>john@doe.com</email> <email><a href="mailto:john@doe.com">john@doe.com</a></email> <email><b><a href="mailto:john@doe.com">john@doe.com</a></b></email> <email><span><b><a href="mailto:john@doe.com">john@doe.com</a></b></span></email> <email>"o'brian"@irish.com</email> 

The .trim() is useful in case there is whitespace in the HTML which can show up around the email. .trim()很有用,以防HTML中出现空白,该空白可​​以显示在电子邮件周围。

You can parse each line of Dom and match email regex with tag content, like below snippet : 您可以解析Dom的每一行,并将电子邮件正则表达式与标签内容进行匹配,如以下代码段所示:

<script>
function getEmailsFromText (text)
{
    return text.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
var items = document.getElementsByTagName("*");
    for (var i = 0; i < items.length; i++) {
        var text = items.item(i).textContent;
        var emailIds = getEmailsFromText(text);
        if(emailIds){
        console.log("Emails ID's : "+emailIds);
        }
    }
</script>

To test, open your javascript console tab and paste the above code which inside script tag and you can see all email id's of current html page. 要进行测试,请打开您的javascript控制台标签,并将上面的代码粘贴到script标签内,您可以查看当前html页面的所有电子邮件ID。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM