[英]remove rows where columns contain only NaN or Zero
I have the following list of dataframes.我有以下数据框列表。 I need to remove from each df the rows that include only the values NaN or Zero.
我需要从每个 df 中删除仅包含值 NaN或零的行。 I cannot change all Zeros to NaN, as in other columns, they have a valid meaning rather than reflecting missing/not a number info.
我无法将所有零更改为 NaN,因为在其他列中,它们具有有效含义,而不是反映缺失/不是数字信息。 Ideally, i would like to combine the commands in this sort of format [x.dropna(axis=0, how='all') for x in dfs].
理想情况下,我想以这种格式组合命令 [x.dropna(axis=0, how='all') for x in dfs]。 thank you!
谢谢你!
data数据
df1 = pd.DataFrame(data={'id':[1,2,0,4,5,6],
'a': [41,41,0,43,40,41],
'b': [21,20,0,19,23,23],
'c': [0,0,0,0,43,0],
'd': [12,11,0,0,0,0]})
df2 = pd.DataFrame(data={'id':[0,2,0,4,5,6],
'a': [0,41,0,43,40,41],
'b': ['NaN',20,'NaN',19,23,23],
'c': [0,0,0,0,43,0],
'd': [0,11,0,0,0,0]})
df3 = pd.DataFrame(data={'id':[1,2,'NaN','NaN',5,0],
'a': [41,41,0,43,40,0],
'b': [21,20,0,19,23,0],
'c': [0,0,0,0,43,0],
'd': [12,11,0,0,0,0]})
dfs = [df1,df2,df3]
expected output预计 output
[ id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
2 4 43 19 0 0
3 5 40 23 43 0
4 6 41 23 0 0,
id a b c d
0 2 41 20 0 11
1 4 43 19 0 0
2 5 40 23 43 0
3 6 41 23 0 0,
id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
2 NaN 43 19 0 0
3 5 40 23 43 0
4 0 0 0 0 0]
You can replace 0 to missing values, but better is removed original DataFrames by this repalced one with tested all rows if exist at least one non NaN
value in boolean indexing
:您可以将 0 替换为缺失值,但如果在
boolean indexing
中存在至少一个非NaN
值,则最好通过这个替换的删除原始 DataFrames 并测试所有行:
dfs = [x[x.replace(0, np.nan).notna().any(axis=1)] for x in dfs]
print (dfs)
[ id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
3 4 43 19 0 0
4 5 40 23 43 0
5 6 41 23 0 0, id a b c d
1 2 41 20.0 0 11
3 4 43 19.0 0 0
4 5 40 23.0 43 0
5 6 41 23.0 0 0, id a b c d
0 1.0 41 21 0 12
1 2.0 41 20 0 11
3 NaN 43 19 0 0
4 5.0 40 23 43 0]
If there are only positive values is possible test if sum
is not 0
:如果只有正值,则可以测试
sum
是否为0
:
dfs = [x[x.sum(axis=1).ne(0)] for x in dfs]
Other options其他选项
dfs = pd.concat([df1,df2,df3])
dfs["sum"] = dfs.sum(axis=1)
dfs = dfs.drop(dfs[dfs["sum"] == 0].index)
dfs
Output Output
id a b c d sum
1 2 41 20 0 11 104
3 4 43 19 0 0 86
4 5 40 23 43 0 166
1 2 41 20 0 11 104
3 4 43 19 0 0 86
4 5 40 23 43 0 166
1 2 41 20 0 11 104
3 NaN 43 19 0 0 86
4 5 40 23 43 0 166
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.