Sum all columns in a Pandas DataFrame where there are non-numeric values

Question

I have the following dataset:

df = pd.DataFrame({'col1' : [12,3,4,5,'a',5], 'col2' : [1,5,'b',6,10,1]})

If I run df.sum(axis=0, numeric_only=True) , I get the following output:

Series([], dtype: float64)

However, if I change the non-numeric values to None then it works fine.

So, my question is how can I find the sums of all the columns in my dataset when there are non-numeric values present?

Answer 1

I think you can use to_numeric with apply because to_numeric works only with columns ( Series ):

print (df.apply(pd.to_numeric, errors='coerce').sum())
#same as
#print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).sum())
col1    29.0
col2    23.0
dtype: float64

Another solution is concat with list comprehension :

df = pd.concat([pd.to_numeric(df[col], errors='coerce') for col in df], axis=1).sum()
print (df)
col1    29.0
col2    23.0
dtype: float64

If only few columns faster is repeat code:

df.col1 = pd.to_numeric(df.col1, errors='coerce')
df.col2 = pd.to_numeric(df.col2, errors='coerce')
print (df.sum())
col1    29.0
col2    23.0
dtype: float64

I think numeric_only=True doesnt work for columns for mixed content - numeric with string values.

Sample - col1 is numeric and col2 is non numeric:

df = pd.DataFrame({'col1' : [1,3,4], 'col2' : ['1','5','b']})
print (df)
   col1 col2
0     1    1
1     3    5
2     4    b

print (df.sum(numeric_only=True))
col1    8
dtype: int64

Sum all columns in a Pandas DataFrame where there are non-numeric values

Question

1 answers

solution1
5 ACCPTED 2016-11-25 12:09:19

Sum all columns in a Pandas DataFrame where there are non-numeric values

Question

1 answers

solution1 5 ACCPTED 2016-11-25 12:09:19

solution1
5 ACCPTED 2016-11-25 12:09:19