简体   繁体   中英

Creating new column with sum of other columns

I have dataframe where I create new column as a sum of other columns:

df['newcol'] = df['col1'] + df['col2'] + df['col3'] 

but this solution doesn't work properly, because the sum in the file is much bigger than in dataframe.

Instead of code above I used df.assign as below and it works perfectly

df = df.assign(newcol=[['col1','col2','col3']].sum(axis=1))

Are there any differences in both methods? Why the second one works correctly? Do you have any idea why this works different? Unfortunately I cannot show you the original file.

No, second one is not correct, need list for subset of columns names and then sum :

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), 
                  columns=['col1','col2','col3','col4','col5'])


df = df.assign(newcol=df[['col1','col2','col3']].sum(axis=1))

df['newcol1'] = df['col1'] + df['col2'] + df['col3'] 
print (df)   
   col1  col2  col3  col4  col5  newcol  newcol1
0     8     8     3     7     7      19       19
1     0     4     2     5     2       6        6
2     2     2     1     0     8       5        5
3     4     0     9     6     2      13       13
4     4     1     5     3     4      10       10

df = df.assign(newcol=df[['col1'], df['col2'], df['col3']].sum(axis=1))
print (df)


TypeError: '(['col1'], 0    8
1    4
2    2
3    0
4    1
Name: col2, dtype: int32, 0    3
1    2
2    1
3    9
4    5
Name: col3, dtype: int32)' is an invalid key

Difference is in first solution sum by + by Series , in df[['col1','col2','col3']].sum(axis=1) use sum by DataFrame .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM