简体   繁体   中英

Sum all columns in a Pandas DataFrame where there are non-numeric values

I have the following dataset:

df = pd.DataFrame({'col1' : [12,3,4,5,'a',5], 'col2' : [1,5,'b',6,10,1]})

If I run df.sum(axis=0, numeric_only=True) , I get the following output:

Series([], dtype: float64)

However, if I change the non-numeric values to None then it works fine.

So, my question is how can I find the sums of all the columns in my dataset when there are non-numeric values present?

I think you can use to_numeric with apply because to_numeric works only with columns ( Series ):

print (df.apply(pd.to_numeric, errors='coerce').sum())
#same as
#print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).sum())
col1    29.0
col2    23.0
dtype: float64

Another solution is concat with list comprehension :

df = pd.concat([pd.to_numeric(df[col], errors='coerce') for col in df], axis=1).sum()
print (df)
col1    29.0
col2    23.0
dtype: float64

If only few columns faster is repeat code:

df.col1 = pd.to_numeric(df.col1, errors='coerce')
df.col2 = pd.to_numeric(df.col2, errors='coerce')
print (df.sum())
col1    29.0
col2    23.0
dtype: float64

I think numeric_only=True doesnt work for columns for mixed content - numeric with string values.

Sample - col1 is numeric and col2 is non numeric:

df = pd.DataFrame({'col1' : [1,3,4], 'col2' : ['1','5','b']})
print (df)
   col1 col2
0     1    1
1     3    5
2     4    b

print (df.sum(numeric_only=True))
col1    8
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM