简体   繁体   English

如何使用索引列表从pandas数据帧中删除行

[英]How to Drop rows from pandas dataframe by using a list of indices

Introduction 介绍

We have the following dataframe which we create from a CSV file. 我们有以下数据框,我们从CSV文件创建。

data = pd.read_csv(path + name, usecols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE'] ) data = pd.read_csv(path + name, usecols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE']

I want to delete specific rows from the dataframe. 我想从数据框中删除特定的行。 For this purpose I used a list and appended the ids of the rows we want to delete. 为此,我使用了一个列表并附加了我们要删除的行的ID。

for index,row in data.iterrows():
     if (row['FARE'] >= 2500.00):
       indices.append(index)

From here i am lost. 从这里我迷失了。 Don't know how to use the ids in the list to delete the rows from the dataframe 不知道如何使用列表中的ID来删除数据框中的行


Question

  • The list containing the row ids must be used in the dataframe to delete rows. 必须在数据框中使用包含行ID的列表来删除行。 Is it possible to do it? 有可能吗?

Constraints 约束

  • We can't use data.drop(index,inplace=True) because it really slows the process 我们不能使用data.drop(index,inplace=True)因为它确实会减慢进程
  • We cannot use a filter because I have some special constraints. 我们不能使用过滤器,因为我有一些特殊的约束。

If you are trying to remove rows that have 'FARE' values greater than or equal to zero, you can use a mask that have those values lesser than 2500 - 如果您尝试删除'FARE'值大于或等于零的行,则可以使用具有小于2500值的掩码 -

df_out = df.loc[df.FARE.values < 2500] # Or df[df.FARE.values < 2500]

For large datasets, we might want to work with underlying array data and then construct the output dataframe - 对于大型数据集,我们可能希望使用底层数组数据,然后构造输出数据框 -

df_out = pd.DataFrame(df.values[df.FARE.values < 2500], columns=df.columns)

To use those indices generated from the loopy code in the question - 要使用问题中循环代码生成的indices -

df_out = df.loc[np.setdiff1d(df.index, indices)]

Or with masking again - 或者再次masking -

df_out = df.loc[~df.index.isin(indices)]  # or df[~df.index.isin(indices)]

How about filtering data using DataFrame.query() method: 如何使用DataFrame.query()方法过滤数据:

cols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE']
df = pd.read_csv(path + name, usecols=cols).query("FARE < 2500")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM