对 Pandas DataFrame 中存在非数字值的所有列求和

Question

我有以下数据集：

df = pd.DataFrame({'col1' : [12,3,4,5,'a',5], 'col2' : [1,5,'b',6,10,1]})

如果我运行df.sum(axis=0, numeric_only=True) ，我会得到以下输出：

Series([], dtype: float64)

但是，如果我将非数字值更改为None则它工作正常。

所以，我的问题是，当存在非数字值时，如何找到数据集中所有列的总和？

Answer 1

我认为您可以将to_numeric与apply一起apply因为to_numeric仅适用于列（ Series ）：

print (df.apply(pd.to_numeric, errors='coerce').sum())
#same as
#print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).sum())
col1    29.0
col2    23.0
dtype: float64

另一个解决方案是concat与list comprehension ：

df = pd.concat([pd.to_numeric(df[col], errors='coerce') for col in df], axis=1).sum()
print (df)
col1    29.0
col2    23.0
dtype: float64

如果只有几列更快是重复代码：

df.col1 = pd.to_numeric(df.col1, errors='coerce')
df.col2 = pd.to_numeric(df.col2, errors='coerce')
print (df.sum())
col1    29.0
col2    23.0
dtype: float64

我认为numeric_only=True不适用于混合内容的列 - 带有字符串值的数字。

示例 - col1是数字， col2是非数字：

df = pd.DataFrame({'col1' : [1,3,4], 'col2' : ['1','5','b']})
print (df)
   col1 col2
0     1    1
1     3    5
2     4    b

print (df.sum(numeric_only=True))
col1    8
dtype: int64

对 Pandas DataFrame 中存在非数字值的所有列求和

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-11-25 12:09:19

对 Pandas DataFrame 中存在非数字值的所有列求和

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-11-25 12:09:19

解决方案1
5 已采纳 2016-11-25 12:09:19