简体   繁体   中英

How to parse email address out of WHOIS results

Suppose I have a WHOIS lookup that returns results in the following format (simplified for this question:)

Domain name:           mydomain.ca
Administrative contact:
    Name:              John Smith
    ... other fields...
    Email:             johnsmith@gmail.com
Technical contact:
    Name:              Jane Doe
    Email:             janedoe@gmail.com
Name servers:
    ns1.mydomain.com
    ns2.mydomain.com

I want a regex that will give me the Administrative contact's email address (johnsmith@gmail.com), but NOT the Technical contact's email address. It's not important to verify the format of the email address itself.

I wouldn't try to use a regex for this at all. Here's what I might do:

  1. Split the response into lines
  2. Scan for the line that reads "Administrative contact:"
  3. Scan for the next line that has "Email:"
  4. Extract the second word on that line

You may need to fine-tune this procedure as necessary.

The regex would be:

"Administrative contact:.*?Email: *([^ \n]*)"

You need to make the '.' special character match any character at all, including a newline. Not sure how you do that in C#, but in python the matching (tested it and it works) is done like this:

match = re.search(r"Administrative contact:.*?Email: *([^ \n]*)", text, re.DOTALL)

You might also take into account that this is very inefficient for large files (having . match newlines is quite expensive), so you might take Jordan's solution into account too.

Well, you could just search for email strings in general, the regex for that is:

([\w-+]+(?:\.[\w-+]+)*@(?:[\w-]+\.)+[a-zA-Z]{2,7})

As mentioned before though, registrars can have very different formats on their pages, multiple email addresses, etc. which is going to make this a pain for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM