[英]Adding a new column in pandas which is the total sum of the values of another column
So I'm using pandas and trying to add a new column in called 'Total' where its the sum of all the numbers of vehicles for that year. 因此,我正在使用熊猫,并尝试在“总计”中添加新列,该列是该年所有车辆总数的总和。
From this: 由此:
type year number
Private cars 2005 401638
Motorcycles 2005 138588
Off peak cars 2005 12947
Motorcycles 2005 846
To something like this: 对于这样的事情:
type year number Total
Private cars 2005 401638 554019
Motorcycles 2005 138588
Off peak cars 2005 12947
Motorcycles 2005 846
Using GroupBy
+ transform
with sum
: 使用
GroupBy
+和sum
transform
:
df['Year_Total'] = df.groupby('year')['number'].transform('sum')
Note this will give you the yearly total for each row. 请注意,这将为您提供每一行的年度总计。 If you wish to "blank out" totals for certain rows, you should specify precisely the logic for this.
如果希望某些行的总计“空白”,则应为此精确指定逻辑。
Use GroupBy.transform
and then if necessary replace duplicated values: 使用
GroupBy.transform
,然后在必要时替换重复的值:
df['Total'] = df.groupby('year')['number'].transform('sum')
print (df)
type year number Total
0 Private cars 2005 1 3
1 Motorcycles 2005 2 3
2 Off peak cars 2006 5 20
3 Motorcycles 2006 7 20
4 Motorcycles1 2006 8 20
df.loc[df['year'].duplicated(), 'Total'] = np.nan
print (df)
type year number Total
0 Private cars 2005 1 3.0
1 Motorcycles 2005 2 NaN
2 Off peak cars 2006 5 20.0
3 Motorcycles 2006 7 NaN
4 Motorcycles1 2006 8 NaN
Replacing to empty values is possible, but not recommended, because get mixed values numeric with strings and some function should failed: 可以替换为空值,但不建议这样做,因为获取带字符串的混合值数字和某些函数应该失败:
df.loc[df['year'].duplicated(), 'Total'] = ''
print (df)
type year number Total
0 Private cars 2005 1 3
1 Motorcycles 2005 2
2 Off peak cars 2006 5 20
3 Motorcycles 2006 7
4 Motorcycles1 2006 8
This gives a similar dataframe: 这给出了类似的数据框:
total = df['numer'].sum()
df['Total'] = np.ones_line(df['number'].values) * total
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.