[英]How to Drop rows from pandas dataframe by using a list of indices
We have the following dataframe which we create from a CSV file. 我们有以下数据框,我们从CSV文件创建。
data = pd.read_csv(path + name, usecols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE']
) data = pd.read_csv(path + name, usecols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE']
)
I want to delete specific rows from the dataframe. 我想从数据框中删除特定的行。 For this purpose I used a list and appended the ids of the rows we want to delete.
为此,我使用了一个列表并附加了我们要删除的行的ID。
for index,row in data.iterrows():
if (row['FARE'] >= 2500.00):
indices.append(index)
From here i am lost. 从这里我迷失了。 Don't know how to use the ids in the list to delete the rows from the dataframe
不知道如何使用列表中的ID来删除数据框中的行
data.drop(index,inplace=True)
because it really slows the process data.drop(index,inplace=True)
因为它确实会减慢进程 If you are trying to remove rows that have 'FARE'
values greater than or equal to zero, you can use a mask that have those values lesser than 2500
- 如果您尝试删除
'FARE'
值大于或等于零的行,则可以使用具有小于2500
值的掩码 -
df_out = df.loc[df.FARE.values < 2500] # Or df[df.FARE.values < 2500]
For large datasets, we might want to work with underlying array data and then construct the output dataframe - 对于大型数据集,我们可能希望使用底层数组数据,然后构造输出数据框 -
df_out = pd.DataFrame(df.values[df.FARE.values < 2500], columns=df.columns)
To use those indices
generated from the loopy code in the question - 要使用问题中循环代码生成的
indices
-
df_out = df.loc[np.setdiff1d(df.index, indices)]
Or with masking
again - 或者再次
masking
-
df_out = df.loc[~df.index.isin(indices)] # or df[~df.index.isin(indices)]
How about filtering data using DataFrame.query() method: 如何使用DataFrame.query()方法过滤数据:
cols = ['QTS','DSTP','RSTP','DDATE','RDATE','DTIME','RTIME','DCXR','RCXR','FARE']
df = pd.read_csv(path + name, usecols=cols).query("FARE < 2500")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.