Creating new column with sum of other columns

Question

I have dataframe where I create new column as a sum of other columns:

df['newcol'] = df['col1'] + df['col2'] + df['col3']

but this solution doesn't work properly, because the sum in the file is much bigger than in dataframe.

Instead of code above I used df.assign as below and it works perfectly

df = df.assign(newcol=[['col1','col2','col3']].sum(axis=1))

Are there any differences in both methods? Why the second one works correctly? Do you have any idea why this works different? Unfortunately I cannot show you the original file.

Answer 1

No, second one is not correct, need list for subset of columns names and then sum :

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), 
                  columns=['col1','col2','col3','col4','col5'])


df = df.assign(newcol=df[['col1','col2','col3']].sum(axis=1))

df['newcol1'] = df['col1'] + df['col2'] + df['col3'] 
print (df)   
   col1  col2  col3  col4  col5  newcol  newcol1
0     8     8     3     7     7      19       19
1     0     4     2     5     2       6        6
2     2     2     1     0     8       5        5
3     4     0     9     6     2      13       13
4     4     1     5     3     4      10       10

df = df.assign(newcol=df[['col1'], df['col2'], df['col3']].sum(axis=1))
print (df)


TypeError: '(['col1'], 0    8
1    4
2    2
3    0
4    1
Name: col2, dtype: int32, 0    3
1    2
2    1
3    9
4    5
Name: col3, dtype: int32)' is an invalid key

Difference is in first solution sum by + by Series , in df[['col1','col2','col3']].sum(axis=1) use sum by DataFrame .

Creating new column with sum of other columns

Question

1 answers

solution1
3 2022-09-21 13:14:49

Creating new column with sum of other columns

Question

1 answers

solution1 3 2022-09-21 13:14:49

solution1
3 2022-09-21 13:14:49