I have a dataframe full of emails. Knowing that gmail has a 6 character minimum I want to filter my dataframe by getting rid of any gmail address that has a username of less than six characters. Therefore, the dataframe df
>> print(df)
email
1 a@gmail.com
2 real.email@gmail.com
3 no.email@email.com
4 real@yahoo.com
5 poo@gmail.com
would become:
email
2 real.email@gmail.com
3 no.email@email.com
4 real@yahoo.com
Using
df = df[
(len(df['email'].str.split('@').str[0]) >= 6)
(df['email'].str.split('@').str[1] == 'gmail.com')
]
will filter everything that isn't @gmail.com, so I can't use that. What I want is essentially (which obviously doesn't work and gives a TypeError: 'method' object is not subscriptable
)
if df['email'].str.split['@'].str[1] == 'gmail.com':
len(df['email'].str.split['@'].str[0]) >= 6
How do I accomplish this in a vectorized operation?
You can use:
a = df['email'].str.contains('gmail') #check if email has gmail
b = df['email'].str.split('@').str[0].str.len().gt(6) #check if length before "@" > 6
out = df[a&b|~a]
print(out)
email
2 real.email@gmail.com
3 no.email@email.com
4 real@yahoo.com
See this:
>>> df[(df["email"].str.split("@").str[0].str.len() >= 6) | (df["email"].str.split("@").str[1] != 'gmail.com')]
email
1 real.email@gmail.com
2 no.email@email.com
3 real@yahoo.com
Regarding you saying "will filter everything that isn't @gmail.com", it is not correct. You just need to make your boolean logic right (like above). Also to measure the string length in dataframe, you should use .str.len()
but not taking len
of the whole dataframe output, which the latter will be the size of the dataframe.
You can do:
df=df.loc[~df.email.str.contains(r"^.{0,5}@gmail\.com$")]
Outputs:
email
1 real.email@gmail.com
2 no.email@email.com
3 real@yahoo.com
One way is to store the index in a list and then display just those indices:
ls=[]
for i in range(0,len(df)):
if df['email'][i].split('@')[1] == 'gmail.com':
if len(df['email'][i].split('@')[0]) >= 6:
ls.append(i)
df[df.index.isin(ls)]
Output:
email
1 real.email@gmail.com
2 no.email@email.com
3 real@yahoo.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.