简体   繁体   English

如何从WHOIS结果中解析电子邮件地址

[英]How to parse email address out of WHOIS results

Suppose I have a WHOIS lookup that returns results in the following format (simplified for this question:) 假设我有一个WHOIS查询,该查询以以下格式返回结果(此问题已简化:)

Domain name:           mydomain.ca
Administrative contact:
    Name:              John Smith
    ... other fields...
    Email:             johnsmith@gmail.com
Technical contact:
    Name:              Jane Doe
    Email:             janedoe@gmail.com
Name servers:
    ns1.mydomain.com
    ns2.mydomain.com

I want a regex that will give me the Administrative contact's email address (johnsmith@gmail.com), but NOT the Technical contact's email address. 我想要一个正则表达式,它可以给我行政联系人的电子邮件地址(johnsmith@gmail.com),而不是技术联系人的电子邮件地址。 It's not important to verify the format of the email address itself. 验证电子邮件地址本身的格式并不重要。

I wouldn't try to use a regex for this at all. 我根本不会尝试使用正则表达式。 Here's what I might do: 这是我可能会做的:

  1. Split the response into lines 将响应分成几行
  2. Scan for the line that reads "Administrative contact:" 扫描查找“管理联系人:”的行
  3. Scan for the next line that has "Email:" 扫描具有“电子邮件:”的下一行
  4. Extract the second word on that line 提取该行的第二个单词

You may need to fine-tune this procedure as necessary. 您可能需要根据需要微调此过程。

The regex would be: 正则表达式为:

"Administrative contact:.*?Email: *([^ \n]*)"

You need to make the '.' 您需要输入“。”。 special character match any character at all, including a newline. 特殊字符完全匹配任何字符,包括换行符。 Not sure how you do that in C#, but in python the matching (tested it and it works) is done like this: 不确定如何在C#中做到这一点,但是在python中,匹配(经过测试并可以正常工作)是这样完成的:

match = re.search(r"Administrative contact:.*?Email: *([^ \n]*)", text, re.DOTALL)

You might also take into account that this is very inefficient for large files (having . match newlines is quite expensive), so you might take Jordan's solution into account too. 您可能还考虑到,这对于大文件来说效率非常低(使用。匹配换行符非常昂贵),因此您也可以考虑Jordan的解决方案。

Well, you could just search for email strings in general, the regex for that is: 好吧,您可以只搜索一般的电子邮件字符串,正则表达式为:

([\w-+]+(?:\.[\w-+]+)*@(?:[\w-]+\.)+[a-zA-Z]{2,7})

As mentioned before though, registrars can have very different formats on their pages, multiple email addresses, etc. which is going to make this a pain for you. 如前所述,注册服务商的页面格式,电子邮件地址等可以有非常不同的格式,这会让您感到痛苦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM