[英]Remove rows from dataframe if one column matches a value - Python 3.6
我有一个看起来像这样的csv
:
screen_name,tweet,following,followers,is_retweet,bot
narutouz16,Grad school is lonely.,59,20,0,0
narutouz16,RT @GetMadz: Sound design in this game is 10/10 game freak lied. ,59,20,1,0
narutouz16,@hbthen3rd I know I don't.,59,20,0,0
narutouz16,"@TonyKelly95 I'm still not satisfied in the ending, even though its longer.",59,20,0,0
narutouz16,I'm currently in second place in my leaderboards in duolongo.,59,20,0,0
我可以使用以下命令将其读入dataframe
:
df = pd.read_csv("file.csv")
这很好用。 我print(df.shape)
(1223726, 6)
我有一个用户名列表,如下所示:
bad_names = ['BELOZEROVNIKIT', 'ALTMANBELINDA', '666STEVEROGERS', 'ALVA_MC_GHEE', 'CALIFRONIAREP', 'BECCYWILL', 'BOGDANOVAO2', 'ADELE_BROCK', 'ANN1EMCCONNELL', 'ARONHOLDEN8', 'BISHOLORINE', 'BLACKTIVISTSUS', 'ANGELITHSS', 'ANWARJAMIL22', 'BREMENBOTE', 'BEN_SAR_GENT', 'ASSUNCAOWALLAS', 'AHMADRADJAB', 'AN_N_GASTON', 'BLACK_ELEVATION', 'BERT_HENLEY', 'BLACKERTHEBERR5', 'ARTHCLAUDIA', 'ALBERTA_HAYNESS', 'ADRIANAMFTTT']
我想要做的是循环遍历 dataframe,如果username
在此列表中,则从df
中删除这些行并将它们添加到名为bad_names_df
的新df
中。
伪代码看起来像:
for each row in df:
if row.username in bad_names:
bad_names_df.append(row)
df.remove(row)
else:
continue
我的尝试:
for row, col in df.iterrows():
if row['username'] in bad_user_names:
new_df.append(row)
else:
continue
如何(有效地)循环df
,超过 120 万行,如果用户名在bad_names
列表中,删除该行并将该行添加到bad_names_df
? 我还没有找到任何其他解决此问题的 SO 帖子。
您可以应用 lambda 然后过滤如下:
df['keep'] = df['username'].apply(lambda x: False if x in bad_names else True)
df = df[df['keep']==True]
您还可以使用isin
创建掩码:
mask = df["screen_name"].isin(bad_names)
print (df[mask]) #df of bad names
print (df[~mask]) #df of good names
在一行中使用 isin:
bad_names_df = df[df['screen_name].isin(bad_names)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.