[英]Pandas groupby and subtract rows
I have the following dataframe:我有以下数据框:
id variable year value
1 a 2020 2
1 a 2021 3
1 a 2022 5
1 b 2020 3
1 b 2021 8
1 b 2022 10
I want to groupby id and variable and subtract 2020 values from all the rows of the group.我想对 id 和变量进行分组,并从组的所有行中减去 2020 值。 So I will get:所以我会得到:
id variable year value
1 a 2020 0
1 a 2021 1
1 a 2022 3
1 b 2020 0
1 b 2021 5
1 b 2022 7
How can I do that?我怎样才能做到这一点?
Use DataFrame.merge
if not sure if 2020
is first per groups:如果不确定2020
是否是每个组的第一个,请使用DataFrame.merge
:
df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1,how='left',on=['id','variable'],suffixes=('_',''))['value'].values
print (df)
id variable year value
0 1 a 2020 0
1 1 a 2021 1
2 1 a 2022 3
3 1 b 2020 0
4 1 b 2021 5
5 1 b 2022 7
If 2020
is always first per groups use GroupBy.transform
with GroupBy.first
:如果2020
始终是每个组的第一个,请使用GroupBy.transform
和GroupBy.first
:
df['value'] -= df.groupby(['id','variable'])['value'].transform('first')
print (df)
id variable year value
0 1 a 2020 0
1 1 a 2021 1
2 1 a 2022 3
3 1 b 2020 0
4 1 b 2021 5
5 1 b 2022 7
EDIT:编辑:
If in data are duplicated 2020
rows per groups solution first remove dupes and subtract only first value:如果数据重复,每组2020
行解决方案首先删除重复项并仅减去第一个值:
print (df)
id variable year value
0 1 a 2020 3
1 1 a 2020 2
2 1 a 2022 5
3 1 b 2020 3
4 1 b 2021 8
5 1 b 2022 10
df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1.drop_duplicates(['id','variable']),
how='left',
on=['id','variable'],
suffixes=('_',''))['value'].values
print (df)
id variable year value
0 1 a 2020 0
1 1 a 2020 -1
2 1 a 2022 2
3 1 b 2020 0
4 1 b 2021 5
5 1 b 2022 7
Or aggregate values, eg by sum
for deduplicate data:或聚合值,例如通过重复数据删除的sum
:
print (df)
id variable year value
0 1 a 2020 3
1 1 a 2020 1
2 1 a 2022 5
3 1 b 2020 3
4 1 b 2021 8
5 1 b 2022 10
df = df.groupby(['id','variable','year'], as_index=False).sum()
print (df)
id variable year value
0 1 a 2020 4
1 1 a 2022 5
2 1 b 2020 3
3 1 b 2021 8
4 1 b 2022 10
df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1, how='left',
on=['id','variable'],
suffixes=('_',''))['value'].values
print (df)
id variable year value
0 1 a 2020 0
1 1 a 2022 1
2 1 b 2020 0
3 1 b 2021 5
4 1 b 2022 7
Although 2020
is not the first of group we could use : GroupBy.transform
with Series.where
尽管2020
不是我们可以使用的第一个组: GroupBy.transform
和Series.where
df['value']= df['value'].sub(df['value'].where(df['year'].eq(2020))
.groupby([df['id'],df['variable']])
.transform('max'))
print(df)
id variable year value
0 1 a 2020 0.0
1 1 a 2021 1.0
2 1 a 2022 3.0
3 1 b 2020 0.0
4 1 b 2021 5.0
5 1 b 2022 7.0
if year it is string
you could need如果年份是string
您可能需要
df['year'].eq('2020')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.