[英]Python: Filter rows in data frame when multiple rows meet a unique condition
I have a data set similar to:我有一个类似于以下的数据集:
dt = {'A': [0,0,0,1],
'B': [0, 2,0,3],
'C': [0,0,0,4],
'D': [0,5,0,6]}
dt = pd.DataFrame(dt)
I aim to filter all rows when columns ['A', 'B','C', 'D'] all are zero for that row.我的目标是在该行的列 ['A', 'B','C', 'D'] 全部为零时过滤所有行。 In real data set instead of 4 columns I have more than twenty columns .
在实际数据集中而不是 4 列中,我有超过 20 列。 So the following solution is not feasible:
所以下面的解决方案是不可行的:
dt = dt[(dt['A'] == 0) & (dt['B'] == 0) & (dt['C'] == 0) & (dt['D'] == 0)]
So I came up with the following solution:所以我想出了以下解决方案:
dt['new'] = np.nan
lst = [0,1,2,3]
for i in range(len(dt)):
dt.iloc[i, 4] = all(dt.iloc[i, lst] == 0)
And finally I can filter based on 'new' column.最后我可以根据“新”列进行过滤。
I am looking for a more efficient solution, preferably something without a loop, any help would be appreciated.我正在寻找一个更有效的解决方案,最好是没有循环的东西,任何帮助将不胜感激。
You can try this using DataFrame.eq
with DataFrame.all
and boolean indexing
您可以尝试使用
DataFrame.eq
与DataFrame.all
和boolean indexing
dt[dt.eq(0).all(1)]
A B C D
0 0 0 0 0
2 0 0 0 0
Another idea is to usenp.any
or DataFrame.any
as boolean mask另一个想法是使用
np.any
或DataFrame.any
作为 boolean 掩码
dt[~dt.any(1)] # @Sayandip Dutta's answer in the comments
dt[~np.any(dt, axis=1)]
Try this, DataFrame.sum(axis=1)
试试这个,
DataFrame.sum(axis=1)
dt[dt.sum(axis=1).eq(0)]
A B C D
0 0 0 0 0
2 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.