multiple regex condition after certain character

Question

I would like to do regex that return boolean value if it matches. I want to extract characters after @ . It could be a lot of character. For example I want to check if email using banana or apple domain. sample:

df.head()

EMAIL
data1@gmail.com
data2@yahoo.com 
data3@banana.com
data4@apple.com
apple@gmail.com

I tried this df["sus"] = df["email"].str.match(r'([^@]*banana|apple)') but it also catch before @

result I got

SUS
False
False
True
True
True

result I want

SUS
False
False
True
True
False

Answer 1

You can use .str.contains because .str.match only searches for a match at the start of a string (it is based on re.match ). Also, [^@]* matches zero or more chars other than @ , so it does not restrict matching banana or apple matching (these words may appear at the start, end, anywhere in the string) if you use your pattern.

You can use

df["sus"] = df["email"].str.contains(r'@(?:banana|apple)\b')

See the regex demo

Details :

@ - the @ char
(?:banana|apple) - a non-capturing group matching either banana or apple
\\b - word boundary

multiple regex condition after certain character

Question

1 answers

solution1
2 ACCPTED 2020-09-30 07:40:35

multiple regex condition after certain character

Question

1 answers

solution1 2 ACCPTED 2020-09-30 07:40:35

solution1
2 ACCPTED 2020-09-30 07:40:35