简体   繁体   English

删除列仅包含 NaN 或零的行

[英]remove rows where columns contain only NaN or Zero

I have the following list of dataframes.我有以下数据框列表。 I need to remove from each df the rows that include only the values NaN or Zero.我需要从每个 df 中删除包含值 NaN零的行。 I cannot change all Zeros to NaN, as in other columns, they have a valid meaning rather than reflecting missing/not a number info.我无法将所有零更改为 NaN,因为在其他列中,它们具有有效含义,而不是反映缺失/不是数字信息。 Ideally, i would like to combine the commands in this sort of format [x.dropna(axis=0, how='all') for x in dfs].理想情况下,我想以这种格式组合命令 [x.dropna(axis=0, how='all') for x in dfs]。 thank you!谢谢你!

data数据

df1 = pd.DataFrame(data={'id':[1,2,0,4,5,6], 
                         'a': [41,41,0,43,40,41], 
                         'b': [21,20,0,19,23,23],
                         'c': [0,0,0,0,43,0],
                         'd': [12,11,0,0,0,0]})

df2 = pd.DataFrame(data={'id':[0,2,0,4,5,6], 
                         'a': [0,41,0,43,40,41], 
                         'b': ['NaN',20,'NaN',19,23,23],
                         'c': [0,0,0,0,43,0],
                         'd': [0,11,0,0,0,0]})

df3 = pd.DataFrame(data={'id':[1,2,'NaN','NaN',5,0], 
                         'a': [41,41,0,43,40,0], 
                         'b': [21,20,0,19,23,0],
                         'c': [0,0,0,0,43,0],
                         'd': [12,11,0,0,0,0]})

dfs = [df1,df2,df3]

expected output预计 output

[   id   a   b   c   d
 0   1  41  21   0  12
 1   2  41  20   0  11
 2   4  43  19   0   0
 3   5  40  23  43   0
 4   6  41  23   0   0,
    id   a   b   c   d
 0   2  41  20   0  11
 1   4  43  19   0   0
 2   5  40  23  43   0
 3   6  41  23   0   0,
     id   a   b   c   d
 0    1  41  21   0  12
 1    2  41  20   0  11
 2  NaN  43  19   0   0
 3    5  40  23  43   0
 4    0   0   0   0   0]

You can replace 0 to missing values, but better is removed original DataFrames by this repalced one with tested all rows if exist at least one non NaN value in boolean indexing :您可以将 0 替换为缺失值,但如果在boolean indexing中存在至少一个非NaN值,则最好通过这个替换的删除原始 DataFrames 并测试所有行:

dfs = [x[x.replace(0, np.nan).notna().any(axis=1)] for x in dfs]

print (dfs)
[   id   a   b   c   d
0   1  41  21   0  12
1   2  41  20   0  11
3   4  43  19   0   0
4   5  40  23  43   0
5   6  41  23   0   0,    id   a     b   c   d
1   2  41  20.0   0  11
3   4  43  19.0   0   0
4   5  40  23.0  43   0
5   6  41  23.0   0   0,     id   a   b   c   d
0  1.0  41  21   0  12
1  2.0  41  20   0  11
3  NaN  43  19   0   0
4  5.0  40  23  43   0]

If there are only positive values is possible test if sum is not 0 :如果只有正值,则可以测试sum是否为0

dfs = [x[x.sum(axis=1).ne(0)] for x in dfs]

Other options其他选项

dfs = pd.concat([df1,df2,df3])
dfs["sum"] = dfs.sum(axis=1)
dfs = dfs.drop(dfs[dfs["sum"] == 0].index)
dfs

Output Output

    id  a   b   c   d   sum
1   2   41  20  0   11  104
3   4   43  19  0   0   86
4   5   40  23  43  0   166
1   2   41  20  0   11  104
3   4   43  19  0   0   86
4   5   40  23  43  0   166
1   2   41  20  0   11  104
3   NaN     43  19  0   0   86
4   5   40  23  43  0   166

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM