[英]Sum all columns in a Pandas DataFrame where there are non-numeric values
I have the following dataset:我有以下数据集:
df = pd.DataFrame({'col1' : [12,3,4,5,'a',5], 'col2' : [1,5,'b',6,10,1]})
If I run df.sum(axis=0, numeric_only=True)
, I get the following output:如果我运行
df.sum(axis=0, numeric_only=True)
,我会得到以下输出:
Series([], dtype: float64)
However, if I change the non-numeric values to None
then it works fine.但是,如果我将非数字值更改为
None
则它工作正常。
So, my question is how can I find the sums of all the columns in my dataset when there are non-numeric values present?所以,我的问题是,当存在非数字值时,如何找到数据集中所有列的总和?
I think you can use to_numeric
with apply
because to_numeric
works only with columns ( Series
):我认为您可以将
to_numeric
与apply
一起apply
因为to_numeric
仅适用于列( Series
):
print (df.apply(pd.to_numeric, errors='coerce').sum())
#same as
#print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).sum())
col1 29.0
col2 23.0
dtype: float64
Another solution is concat
with list comprehension
:另一个解决方案是
concat
与list comprehension
:
df = pd.concat([pd.to_numeric(df[col], errors='coerce') for col in df], axis=1).sum()
print (df)
col1 29.0
col2 23.0
dtype: float64
If only few columns faster is repeat code:如果只有几列更快是重复代码:
df.col1 = pd.to_numeric(df.col1, errors='coerce')
df.col2 = pd.to_numeric(df.col2, errors='coerce')
print (df.sum())
col1 29.0
col2 23.0
dtype: float64
I think numeric_only=True
doesnt work for columns for mixed content - numeric with string values.我认为
numeric_only=True
不适用于混合内容的列 - 带有字符串值的数字。
Sample - col1
is numeric and col2
is non numeric:示例 -
col1
是数字, col2
是非数字:
df = pd.DataFrame({'col1' : [1,3,4], 'col2' : ['1','5','b']})
print (df)
col1 col2
0 1 1
1 3 5
2 4 b
print (df.sum(numeric_only=True))
col1 8
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.