[英]Regex to extract email address
I want to be able to extract an email address embedded in tags eg <email> test@demo.com </email>
where the src is as <email>test@demo.com</email>
我希望能够提取嵌入在标记中的电子邮件地址,例如<email> test@demo.com </email>
,其中src为<email>test@demo.com</email>
My expression I use is as follows: (?<=email>).*(?=<)/i)
. 我使用的表达式如下: (?<=email>).*(?=<)/i)
。 This works well. 这很好。 However, if the email is a hyperlink ie <email>**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> </email>
但是,如果电子邮件是超链接,即<email>**<a href="mailto:test@demo.com" target="_blank"**>test@demo.com</a> </email>
then i can no longer extract the extact email address. 那么我将无法再提取确切的电子邮件地址。 i get the following: <a href="mailto:test@demo.com">test@demo.com</a>
instead of test@demo.com
. 我得到以下信息: <a href="mailto:test@demo.com">test@demo.com</a>
而不是test@demo.com
。 I have tried (?<=a href="mailto:).*(?="target="_blank")/i)
but nothing is returned. 我已经尝试过(?<=a href="mailto:).*(?="target="_blank")/i)
但是什么也没有返回。 Any ideas on how to extract the email when the hyperlink is there? 关于超链接存在时如何提取电子邮件的任何想法?
Web dev 101: don't parse HTML with regex, use DOM manipulations instead. Web开发人员101:请勿使用正则表达式解析HTML,而应使用DOM操作。
This below logs all the emails, whether they are inside plain email
tags or a inside email
tags or any nesting of tags. 这下面记录了所有电子邮件,无论它们是在普通email
标签内还是a inside email
标签内或任何嵌套的标签中。
console.log( Array.from(document.getElementsByTagName('email')) .map(elt => elt.textContent) .map(email => email.trim()) )
<email>john@doe.com</email> <email><a href="mailto:john@doe.com">john@doe.com</a></email> <email><b><a href="mailto:john@doe.com">john@doe.com</a></b></email> <email><span><b><a href="mailto:john@doe.com">john@doe.com</a></b></span></email> <email>"o'brian"@irish.com</email>
The .trim()
is useful in case there is whitespace in the HTML which can show up around the email. .trim()
很有用,以防HTML中出现空白,该空白可以显示在电子邮件周围。
You can parse each line of Dom and match email regex with tag content, like below snippet : 您可以解析Dom的每一行,并将电子邮件正则表达式与标签内容进行匹配,如以下代码段所示:
<script>
function getEmailsFromText (text)
{
return text.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
var items = document.getElementsByTagName("*");
for (var i = 0; i < items.length; i++) {
var text = items.item(i).textContent;
var emailIds = getEmailsFromText(text);
if(emailIds){
console.log("Emails ID's : "+emailIds);
}
}
</script>
To test, open your javascript console tab and paste the above code which inside script tag and you can see all email id's of current html page. 要进行测试,请打开您的javascript控制台标签,并将上面的代码粘贴到script标签内,您可以查看当前html页面的所有电子邮件ID。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.