简体   繁体   English

groupby中的值之间的差异

[英]Difference between values in groupby

I have 2 data frames, one for export and one for import. 我有2个数据框,一个用于导出,一个用于导入。 I concatenated the export and import figure in a single data frame using pd.concat() . 我使用pd.concat()将导出和导入图形连接在单个数据框中。

table3 = pd.concat([table1,table2],keys=['table1','table2'])

The output is : 输出为:

SRI LANKA DSR
                  count   sumavlue
table1 194  SRI LANKA DSR  139571409
table2 185  SRI LANKA DSR   1803152


ST HELENA
                count  sumavlue
table1 195  ST HELENA        24
table2 186  ST HELENA         0


ST KITT N A
                  count  sumavlue
table1 196  ST KITT N A         0
table2 187  ST KITT N A         0

Now I need to calculate the difference between first and second row of each country and get a new column (rename version). 现在,我需要计算每个国家的第一行与第二行之间的差异,并获得一个新列(重命名版本)。 How can I get this? 我怎么能得到这个?

I need to have (export- import) as diff (new column name ) for each country. 我需要为每个国家/地区添加(export-import)作为diff(新列名)。

  • For Srilanka it will be 139571409-1803152 =XXXXXX 对于Srilanka,它将是139571409-1803152 = XXXXXX
  • For ST HELENa it will be 24-0 =24 对于ST HELENa,它将是24-0 = 24
  • and so on for other countries 以此类推

In the above example 在上面的例子中

You could try using a combination of groupby, unstack, and stack. 您可以尝试使用groupby,unstack和stack的组合。 I'm not sure what your column names are so I assumed some liberties. 我不确定您的列名是什么,所以我假设有一些自由。 Here is my work: 这是我的工作:

# Make DataFrame
df = pd.DataFrame({'country' : ['Sri Lanka DSR']*2 + ['St Helena']*2 + ['St Kitt']*2,
                   'table' : ['table1', 'table2']*3,
                   'ID' : [194, 185, 195, 186, 196, 187],
                   'sumvalue' : [139571409, 1803152, 24, 0, 0, 0]})

# Groupby 'country', 'table', 'ID';
# unstack 'table', 'ID' and take reverse difference on the columns;
# stack 'table', 'ID' and rename 'sumvalue' to 'diff'
df.groupby(['country',
            'table',
            'ID']).agg({'sumvalue' : 'sum'}).unstack(['table',
                                                      'ID']).\
            diff(periods = -1,
                 axis = 1).stack(['table',
                                  'ID']).rename(columns = {'sumvalue' : 'diff'})

差异

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM