简体   繁体   English

如果一列与值匹配,则从 dataframe 中删除行 - Python 3.6

[英]Remove rows from dataframe if one column matches a value - Python 3.6

I have a csv that looks like this:我有一个看起来像这样的csv

screen_name,tweet,following,followers,is_retweet,bot
narutouz16,Grad school is lonely.,59,20,0,0
narutouz16,RT @GetMadz: Sound design in this game is 10/10 game freak lied. ,59,20,1,0
narutouz16,@hbthen3rd I know I don't.,59,20,0,0
narutouz16,"@TonyKelly95 I'm still not satisfied in the ending, even though its longer.",59,20,0,0
narutouz16,I'm currently in second place in my leaderboards in duolongo.,59,20,0,0

I am able to read this into a dataframe using the following:我可以使用以下命令将其读入dataframe

df = pd.read_csv("file.csv")

That works great.这很好用。 I get the following dimensions when I print(df.shape) (1223726, 6)print(df.shape) (1223726, 6)

I have a list of usernames, like below:我有一个用户名列表,如下所示:

bad_names = ['BELOZEROVNIKIT',  'ALTMANBELINDA',    '666STEVEROGERS',   'ALVA_MC_GHEE',     'CALIFRONIAREP',    'BECCYWILL',    'BOGDANOVAO2',  'ADELE_BROCK',  'ANN1EMCCONNELL',   'ARONHOLDEN8',  'BISHOLORINE',  'BLACKTIVISTSUS',   'ANGELITHSS',   'ANWARJAMIL22',     'BREMENBOTE',   'BEN_SAR_GENT',     'ASSUNCAOWALLAS',   'AHMADRADJAB',  'AN_N_GASTON',  'BLACK_ELEVATION',  'BERT_HENLEY',  'BLACKERTHEBERR5',  'ARTHCLAUDIA',  'ALBERTA_HAYNESS',  'ADRIANAMFTTT']

What I want to do is loop through the dataframe, and if the username is in this list at all, to remove those rows from df and add them to a new df called bad_names_df .我想要做的是循环遍历 dataframe,如果username在此列表中,则从df中删除这些行并将它们添加到名为bad_names_df的新df中。

Pseudocode would look like:伪代码看起来像:

for each row in df:
    if row.username in bad_names:
        bad_names_df.append(row)
        df.remove(row)
    else:
        continue

My attempt:我的尝试:

for row, col in df.iterrows():
    if row['username'] in bad_user_names:
        new_df.append(row)
    else:
        continue

How is it possible to (efficiently) loop through df , with over 1.2M rows, and if the username is in the bad_names list, remove that row and add that row to a bad_names_df ?如何(有效地)循环df ,超过 120 万行,如果用户名在bad_names列表中,删除该行并将该行添加到bad_names_df I have not found any other SO posts that address this issue.我还没有找到任何其他解决此问题的 SO 帖子。

You can apply a lambda then filter as follows:您可以应用 lambda 然后过滤如下:

df['keep'] = df['username'].apply(lambda x: False if x in bad_names else True)
df = df[df['keep']==True]

You can also create a mask using isin :您还可以使用isin创建掩码:

mask = df["screen_name"].isin(bad_names)
print (df[mask])  #df of bad names
print (df[~mask]) #df of good names

Using isin in one line:在一行中使用 isin:

bad_names_df = df[df['screen_name].isin(bad_names)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从DataFrame中删除与列匹配特定值的行 - Removing rows from a DataFrame where column matches a specific value 如果列值在 C 列中,则从 dataframe 中删除行 - Remove rows from dataframe if A column value is in C column 检查一个数据框的列名是否与另一个数据框的索引值匹配,并将值填充到新列中 - Check if column name from one dataframe matches with index value of another dataframe and populate value into a new column Python:在数据框中用相同的值填充特定列并删除无用的行 - Python : Fill a specific column with the same value in a Dataframe and remove the rows useless Python Pandas Dataframe按Timedelta列值删除行 - Python Pandas Dataframe Remove Rows by Timedelta Column Value 如何使用python删除-9999作为列值的数据帧的行? - how to remove the rows of a dataframe with -9999 as column value using python? 通过删除行从DataFrame的一列中消除偏斜? - Remove skew from one column in DataFrame by deleting rows? 在 Python Dataframe 中删除包含 column1 中另一个特定 column2 上至少一个特定值的所有行 - Remove all rows that contains the IDs in column1 that have at least one specific value on another specific column2 in a Python Dataframe 删除熊猫数据框中列与数据类型不匹配的行 - Remove rows in pandas dataframe where column doesnot matches a dataype Python:在一列中具有相同值的行的pandas数据框比较 - Python: pandas dataframe comparison of rows with the same value in one column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM