简体   繁体   中英

grep regex match email address

I have a file test.txt which contains the following content:

BC@ABSC.CA
ABCabc+-._@mcmaster.io.ca
ABCabc+-._@school.image
ABCabc+-._@school3-computer.image
ABCabc+-._@school3-IT.image.tor.chrome.ca
ABCabc+-._@school3-IT.image.tor.chrome.canadannn
ABC123abc+-._@school3-IT.imageal.tor.chrome.canadannn
ABCabc+-._@school3-*IT.image.tor.chrome.ca
ABCabc+-._@school3-IT.image.tor.chrome.caskdlfj
ABCab*c+-._@school3-IT.image.tor.chrome.caABCabc

I then use

grep -E '^[A-Za-z0-9+._-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}' test.txt

trying to match valid email. The key here is that the last subdomain has to be a sequence of 2 to 6 char.

So I am hopping to the get the following output:

BC@ABSC.CA
ABCabc+-._@mcmaster.io.ca
ABCabc+-._@school.image
ABCabc+-._@school3-computer.image
ABCabc+-._@school3-IT.image.tor.chrome.ca

But i also get the following even tho the length of the last domain succeed 6 chars.

ABCabc+-._@school3-IT.image.tor.chrome.canadannn
ABC123abc+-._@school3-IT.imageal.tor.chrome.canadannn
ABCabc+-._@school3-IT.image.tor.chrome.caskdlfj

How do i solve this problem?

The problem is that grep matches anything in a line. If you want the exact whole line, add the $ terminator at the end. Let's look at an example:

ABCabc+-._@school3-IT.image.tor.chrome.canadannn
  1. ABCabc+-._ matches ^[A-Za-z0-9+._-]+
  2. @ matches @
  3. school3-IT.image.tor.chrome. matches ([a-zA-Z0-9-]+\\.)+ . As far as I know, all quantifiers are greedy in grep .
  4. canada matches [a-zA-Z]{2,6}
  5. nnn gets ignored

Without the $ , there just has to be some part of the line that matches, not necessarily the whole thing.

Add an endline anchor to your regex: $ :

grep -E '^[A-Za-z0-9+._-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}$' test.txt

More about it: http://www.regular-expressions.info/anchors.html

You can fix your query by adding a $ at the end of your string.

grep -E '^[A-Za-z0-9+._-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}$' test.txt

Here is a live demo: https://regex101.com/r/NtZJQ0/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM