groupby中的值之间的差异

Question

I have 2 data frames, one for export and one for import. 我有2个数据框，一个用于导出，一个用于导入。 I concatenated the export and import figure in a single data frame using pd.concat() . 我使用pd.concat()将导出和导入图形连接在单个数据框中。

table3 = pd.concat([table1,table2],keys=['table1','table2'])

The output is : 输出为：

SRI LANKA DSR
                  count   sumavlue
table1 194  SRI LANKA DSR  139571409
table2 185  SRI LANKA DSR   1803152


ST HELENA
                count  sumavlue
table1 195  ST HELENA        24
table2 186  ST HELENA         0


ST KITT N A
                  count  sumavlue
table1 196  ST KITT N A         0
table2 187  ST KITT N A         0

Now I need to calculate the difference between first and second row of each country and get a new column (rename version). 现在，我需要计算每个国家的第一行与第二行之间的差异，并获得一个新列（重命名版本）。 How can I get this? 我怎么能得到这个？

I need to have (export- import) as diff (new column name ) for each country. 我需要为每个国家/地区添加（export-import）作为diff（新列名）。

For Srilanka it will be 139571409-1803152 =XXXXXX 对于Srilanka，它将是139571409-1803152 = XXXXXX
For ST HELENa it will be 24-0 =24 对于ST HELENa，它将是24-0 = 24
and so on for other countries 以此类推

In the above example 在上面的例子中

Answer 1

You could try using a combination of groupby, unstack, and stack. 您可以尝试使用groupby，unstack和stack的组合。 I'm not sure what your column names are so I assumed some liberties. 我不确定您的列名是什么，所以我假设有一些自由。 Here is my work: 这是我的工作：

# Make DataFrame
df = pd.DataFrame({'country' : ['Sri Lanka DSR']*2 + ['St Helena']*2 + ['St Kitt']*2,
                   'table' : ['table1', 'table2']*3,
                   'ID' : [194, 185, 195, 186, 196, 187],
                   'sumvalue' : [139571409, 1803152, 24, 0, 0, 0]})

# Groupby 'country', 'table', 'ID';
# unstack 'table', 'ID' and take reverse difference on the columns;
# stack 'table', 'ID' and rename 'sumvalue' to 'diff'
df.groupby(['country',
            'table',
            'ID']).agg({'sumvalue' : 'sum'}).unstack(['table',
                                                      'ID']).\
            diff(periods = -1,
                 axis = 1).stack(['table',
                                  'ID']).rename(columns = {'sumvalue' : 'diff'})

groupby中的值之间的差异

问题描述

1 个解决方案

解决方案1
0 2017-09-25 20:33:55

groupby中的值之间的差异

问题描述

1 个解决方案

解决方案1 0 2017-09-25 20:33:55

解决方案1
0 2017-09-25 20:33:55