简体   繁体   English

熊猫分组并减去行

[英]Pandas groupby and subtract rows

I have the following dataframe:我有以下数据框:

id variable year value
1      a    2020   2
1      a    2021   3
1      a    2022   5
1      b    2020   3
1      b    2021   8
1      b    2022   10

I want to groupby id and variable and subtract 2020 values from all the rows of the group.我想对 id 和变量进行分组,并从组的所有行中减去 2020 值。 So I will get:所以我会得到:

id variable year value
1      a    2020   0
1      a    2021   1
1      a    2022   3
1      b    2020   0
1      b    2021   5
1      b    2022   7

How can I do that?我怎样才能做到这一点?

Use DataFrame.merge if not sure if 2020 is first per groups:如果不确定2020是否是每个组的第一个,请使用DataFrame.merge

df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1,how='left',on=['id','variable'],suffixes=('_',''))['value'].values
print (df)
   id variable  year  value
0   1        a  2020      0
1   1        a  2021      1
2   1        a  2022      3
3   1        b  2020      0
4   1        b  2021      5
5   1        b  2022      7

If 2020 is always first per groups use GroupBy.transform with GroupBy.first :如果2020始终是每个组的第一个,请使用GroupBy.transformGroupBy.first

df['value'] -= df.groupby(['id','variable'])['value'].transform('first')
print (df)
   id variable  year  value
0   1        a  2020      0
1   1        a  2021      1
2   1        a  2022      3
3   1        b  2020      0
4   1        b  2021      5
5   1        b  2022      7

EDIT:编辑:

If in data are duplicated 2020 rows per groups solution first remove dupes and subtract only first value:如果数据重复,每组2020行解决方案首先删除重复项并仅减去第一个值:

print (df)
   id variable  year  value
0   1        a  2020      3
1   1        a  2020      2
2   1        a  2022      5
3   1        b  2020      3
4   1        b  2021      8
5   1        b  2022     10

df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1.drop_duplicates(['id','variable']),
                        how='left',
                        on=['id','variable'],
                        suffixes=('_',''))['value'].values

print (df)
   id variable  year  value
0   1        a  2020      0
1   1        a  2020     -1
2   1        a  2022      2
3   1        b  2020      0
4   1        b  2021      5
5   1        b  2022      7

Or aggregate values, eg by sum for deduplicate data:或聚合值,例如通过重复数据删除的sum

print (df)
   id variable  year  value
0   1        a  2020      3
1   1        a  2020      1
2   1        a  2022      5
3   1        b  2020      3
4   1        b  2021      8
5   1        b  2022     10

df = df.groupby(['id','variable','year'], as_index=False).sum()
print (df)
   id variable  year  value
0   1        a  2020      4
1   1        a  2022      5
2   1        b  2020      3
3   1        b  2021      8
4   1        b  2022     10

df1 = df[df['year'].eq(2020)]
df['value'] -= df.merge(df1, how='left',
                        on=['id','variable'],
                        suffixes=('_',''))['value'].values

print (df)
   id variable  year  value
0   1        a  2020      0
1   1        a  2022      1
2   1        b  2020      0
3   1        b  2021      5
4   1        b  2022      7

Although 2020 is not the first of group we could use : GroupBy.transform with Series.where尽管2020不是我们可以使用的第一个组GroupBy.transformSeries.where

df['value']= df['value'].sub(df['value'].where(df['year'].eq(2020))
                                        .groupby([df['id'],df['variable']])
                                        .transform('max'))
print(df)
   id variable  year  value
0   1        a  2020    0.0
1   1        a  2021    1.0
2   1        a  2022    3.0
3   1        b  2020    0.0
4   1        b  2021    5.0
5   1        b  2022    7.0

if year it is string you could need如果年份string您可能需要

df['year'].eq('2020')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM