I have dataframe where I create new column as a sum of other columns:
df['newcol'] = df['col1'] + df['col2'] + df['col3']
but this solution doesn't work properly, because the sum in the file is much bigger than in dataframe.
Instead of code above I used df.assign as below and it works perfectly
df = df.assign(newcol=[['col1','col2','col3']].sum(axis=1))
Are there any differences in both methods? Why the second one works correctly? Do you have any idea why this works different? Unfortunately I cannot show you the original file.
No, second one is not correct, need list for subset of columns names and then sum
:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)),
columns=['col1','col2','col3','col4','col5'])
df = df.assign(newcol=df[['col1','col2','col3']].sum(axis=1))
df['newcol1'] = df['col1'] + df['col2'] + df['col3']
print (df)
col1 col2 col3 col4 col5 newcol newcol1
0 8 8 3 7 7 19 19
1 0 4 2 5 2 6 6
2 2 2 1 0 8 5 5
3 4 0 9 6 2 13 13
4 4 1 5 3 4 10 10
df = df.assign(newcol=df[['col1'], df['col2'], df['col3']].sum(axis=1))
print (df)
TypeError: '(['col1'], 0 8
1 4
2 2
3 0
4 1
Name: col2, dtype: int32, 0 3
1 2
2 1
3 9
4 5
Name: col3, dtype: int32)' is an invalid key
Difference is in first solution sum
by +
by Series
, in df[['col1','col2','col3']].sum(axis=1)
use sum
by DataFrame
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.