I have a data set similar to:
dt = {'A': [0,0,0,1],
'B': [0, 2,0,3],
'C': [0,0,0,4],
'D': [0,5,0,6]}
dt = pd.DataFrame(dt)
I aim to filter all rows when columns ['A', 'B','C', 'D'] all are zero for that row. In real data set instead of 4 columns I have more than twenty columns . So the following solution is not feasible:
dt = dt[(dt['A'] == 0) & (dt['B'] == 0) & (dt['C'] == 0) & (dt['D'] == 0)]
So I came up with the following solution:
dt['new'] = np.nan
lst = [0,1,2,3]
for i in range(len(dt)):
dt.iloc[i, 4] = all(dt.iloc[i, lst] == 0)
And finally I can filter based on 'new' column.
I am looking for a more efficient solution, preferably something without a loop, any help would be appreciated.
You can try this using DataFrame.eq
with DataFrame.all
and boolean indexing
dt[dt.eq(0).all(1)]
A B C D
0 0 0 0 0
2 0 0 0 0
Another idea is to usenp.any
or DataFrame.any
as boolean mask
dt[~dt.any(1)] # @Sayandip Dutta's answer in the comments
dt[~np.any(dt, axis=1)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.