简体   繁体   中英

multiple regex condition after certain character

I would like to do regex that return boolean value if it matches. I want to extract characters after @ . It could be a lot of character. For example I want to check if email using banana or apple domain. sample:

df.head()

EMAIL
data1@gmail.com
data2@yahoo.com 
data3@banana.com
data4@apple.com
apple@gmail.com

I tried this df["sus"] = df["email"].str.match(r'([^@]*banana|apple)') but it also catch before @

result I got

SUS
False
False
True
True
True

result I want

SUS
False
False
True
True
False

You can use .str.contains because .str.match only searches for a match at the start of a string (it is based on re.match ). Also, [^@]* matches zero or more chars other than @ , so it does not restrict matching banana or apple matching (these words may appear at the start, end, anywhere in the string) if you use your pattern.

You can use

df["sus"] = df["email"].str.contains(r'@(?:banana|apple)\b')

See the regex demo

Details :

  • @ - the @ char
  • (?:banana|apple) - a non-capturing group matching either banana or apple
  • \\b - word boundary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM