简体   繁体   中英

Summing two object columns in pandas

Having an issue combining two like columns that have dtype object. Since the two columns are the same they never both have values in the same row. Everything in the columns are integers but there are some nan values and "$0" which all of solutions I have tried do not seem to bypass. The data looks like this:

Actual    MTD Actual 
nan       3
nan       $0  
nan       nan
3         nan
2         nan
1         nan

I have tried changing the columns to string type and then to integer type. I have also tried filling in nan values with 0 but this does not seem to work

What I've tried:
1. df[["Actual", "MTD Actual"]].sum(axis=1)
2. df['Actual'].add(df['MTD Actual'], fill_value=0)
3. pd.to_numeric(df['MTD Actual'])

Corresponding error messages:
1. Will sum but the whole column is NaN
2. Returns "unsupported operand type(s) for +: 'int' and 'str' "
3. Unable to parse string "$0" at position 3266

I would like the output to be:

Actual     
3      
0         
nan       
3         
2         
1         

You have two different issues. First, you want to convert your non-numeric columns to numeric values. Second, you want to sum across the columns, keeping nan values where all the rows are nan but treating them as 0 otherwise.

Here's a solution which should work:

df.loc[df.any(axis=1)] = df.replace('[\$,]', '', regex=True).astype(float).fillna(0)
df = df.sum(axis=1)

The regular expression removes dollar signs and commas. .astype(float) casts the data to be numeric, and .fillna(0) replaces the nan s. df.loc[df.any(axis=1)] means we're only changing the values of rows where there's at least one non- nan value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM