简体   繁体   English

如何用Pandas DatatFrame中的行的总和替换NaN

[英]How to replace NaN with sum of the row in Pandas DatatFrame

I am trying to replace the NaN in certain columns with the sum of the row in a Pandas DataFrame. 我试图用Pandas DataFrame中的行的总和替换某些列中的NaN。 See below the example data: 请参见下面的示例数据:

Items|  Estimate1|  Estimate2|  Estimate3|     
Item1|  NaN      |     NaN   |            8    
Item2|  NaN      |  NaN          |  5.5|

I am hoping to have Estimate 1 & 2 to be 8 and 5.5 for Item 1 and 2 respectively. 我希望对于第1项和第2项,估计1和2分别为8和5.5。

So far I have tried using df.fillna(df.sum(), inplace=True) but there is no change in the DataFrame. 到目前为止,我尝试使用df.fillna(df.sum(), inplace=True)但DataFrame没有变化。 Can anyone assist me correct my code or recommend the right way to do it? 任何人都可以帮我纠正我的代码或推荐正确的方法吗?

Providing axis=1 does not seem to work (as filling with a Series only works for the column-by-column case, not for row-by-row). 提供axis=1似乎不起作用(因为填充系列仅适用于逐列的情况,而不适用于逐行)。
A workaround is to 'broadcast' the sum of each row to a dataframe that has the same index/columns as the original one. 解决方法是将每行的总和“广播”到与原始索引/列具有相同索引/列的数据帧。 With a slightly modified example dataframe: 使用稍微修改的示例数据帧:

In [57]: df = pd.DataFrame([[np.nan, 3.3, 8], [np.nan, np.nan, 5.5]], index=['Item1', 'Item2'], columns=['Estimate1', 'Estimate2', 'Estimate3'])

In [58]: df
Out[58]:
       Estimate1  Estimate2  Estimate3
Item1        NaN        3.3        8.0
Item2        NaN        NaN        5.5

In [59]: fill_value = pd.DataFrame({col: df.sum(axis=1) for col in df.columns})

In [60]: fill_value
Out[60]:
       Estimate1  Estimate2  Estimate3
Item1       11.3       11.3       11.3
Item2        5.5        5.5        5.5

In [61]: df.fillna(fill_value)
Out[61]:
       Estimate1  Estimate2  Estimate3
Item1       11.3        3.3        8.0
Item2        5.5        5.5        5.5

There is an open enhancement issue for this: https://github.com/pydata/pandas/issues/4514 有一个开放的增强问题: https//github.com/pydata/pandas/issues/4514

As an alternative, you can also use an apply with a lambda expression like this: 作为替代方案,您还可以使用带有lambda表达式的apply ,如下所示:

df.apply(lambda row: row.fillna(row.sum()), axis=1)

yielding the desired outcome 产生预期的结果

       Estimate1  Estimate2  Estimate3
Item1       11.3        3.3        8.0
Item2        5.5        5.5        5.5

Not sure about efficiency though. 虽然不确定效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM