[英]Remove rows from dataframe if one column matches a value - Python 3.6
I have a csv
that looks like this:我有一个看起来像这样的
csv
:
screen_name,tweet,following,followers,is_retweet,bot
narutouz16,Grad school is lonely.,59,20,0,0
narutouz16,RT @GetMadz: Sound design in this game is 10/10 game freak lied. ,59,20,1,0
narutouz16,@hbthen3rd I know I don't.,59,20,0,0
narutouz16,"@TonyKelly95 I'm still not satisfied in the ending, even though its longer.",59,20,0,0
narutouz16,I'm currently in second place in my leaderboards in duolongo.,59,20,0,0
I am able to read this into a dataframe
using the following:我可以使用以下命令将其读入
dataframe
:
df = pd.read_csv("file.csv")
That works great.这很好用。 I get the following dimensions when I
print(df.shape)
(1223726, 6)
我
print(df.shape)
(1223726, 6)
I have a list of usernames, like below:我有一个用户名列表,如下所示:
bad_names = ['BELOZEROVNIKIT', 'ALTMANBELINDA', '666STEVEROGERS', 'ALVA_MC_GHEE', 'CALIFRONIAREP', 'BECCYWILL', 'BOGDANOVAO2', 'ADELE_BROCK', 'ANN1EMCCONNELL', 'ARONHOLDEN8', 'BISHOLORINE', 'BLACKTIVISTSUS', 'ANGELITHSS', 'ANWARJAMIL22', 'BREMENBOTE', 'BEN_SAR_GENT', 'ASSUNCAOWALLAS', 'AHMADRADJAB', 'AN_N_GASTON', 'BLACK_ELEVATION', 'BERT_HENLEY', 'BLACKERTHEBERR5', 'ARTHCLAUDIA', 'ALBERTA_HAYNESS', 'ADRIANAMFTTT']
What I want to do is loop through the dataframe, and if the username
is in this list at all, to remove those rows from df
and add them to a new df
called bad_names_df
.我想要做的是循环遍历 dataframe,如果
username
在此列表中,则从df
中删除这些行并将它们添加到名为bad_names_df
的新df
中。
Pseudocode would look like:伪代码看起来像:
for each row in df:
if row.username in bad_names:
bad_names_df.append(row)
df.remove(row)
else:
continue
My attempt:我的尝试:
for row, col in df.iterrows():
if row['username'] in bad_user_names:
new_df.append(row)
else:
continue
How is it possible to (efficiently) loop through df
, with over 1.2M rows, and if the username is in the bad_names
list, remove that row and add that row to a bad_names_df
?如何(有效地)循环
df
,超过 120 万行,如果用户名在bad_names
列表中,删除该行并将该行添加到bad_names_df
? I have not found any other SO posts that address this issue.我还没有找到任何其他解决此问题的 SO 帖子。
You can apply a lambda then filter as follows:您可以应用 lambda 然后过滤如下:
df['keep'] = df['username'].apply(lambda x: False if x in bad_names else True)
df = df[df['keep']==True]
You can also create a mask using isin
:您还可以使用
isin
创建掩码:
mask = df["screen_name"].isin(bad_names)
print (df[mask]) #df of bad names
print (df[~mask]) #df of good names
Using isin in one line:在一行中使用 isin:
bad_names_df = df[df['screen_name].isin(bad_names)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.