I would like to do regex that return boolean value if it matches. I want to extract characters after @
. It could be a lot of character. For example I want to check if email using banana
or apple
domain. sample:
df.head()
EMAIL
data1@gmail.com
data2@yahoo.com
data3@banana.com
data4@apple.com
apple@gmail.com
I tried this df["sus"] = df["email"].str.match(r'([^@]*banana|apple)')
but it also catch before @
result I got
SUS
False
False
True
True
True
result I want
SUS
False
False
True
True
False
You can use .str.contains
because .str.match
only searches for a match at the start of a string (it is based on re.match
). Also, [^@]*
matches zero or more chars other than @
, so it does not restrict matching banana
or apple
matching (these words may appear at the start, end, anywhere in the string) if you use your pattern.
You can use
df["sus"] = df["email"].str.contains(r'@(?:banana|apple)\b')
See the regex demo
Details :
@
- the @
char (?:banana|apple)
- a non-capturing group matching either banana
or apple
\\b
- word boundary
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.