[英]Difference between values in groupby
I have 2 data frames, one for export and one for import. 我有2个数据框,一个用于导出,一个用于导入。 I concatenated the export and import figure in a single data frame using
pd.concat()
. 我使用
pd.concat()
将导出和导入图形连接在单个数据框中。
table3 = pd.concat([table1,table2],keys=['table1','table2'])
The output is : 输出为:
SRI LANKA DSR
count sumavlue
table1 194 SRI LANKA DSR 139571409
table2 185 SRI LANKA DSR 1803152
ST HELENA
count sumavlue
table1 195 ST HELENA 24
table2 186 ST HELENA 0
ST KITT N A
count sumavlue
table1 196 ST KITT N A 0
table2 187 ST KITT N A 0
Now I need to calculate the difference between first and second row of each country and get a new column (rename version). 现在,我需要计算每个国家的第一行与第二行之间的差异,并获得一个新列(重命名版本)。 How can I get this?
我怎么能得到这个?
I need to have (export- import) as diff (new column name ) for each country. 我需要为每个国家/地区添加(export-import)作为diff(新列名)。
In the above example 在上面的例子中
You could try using a combination of groupby, unstack, and stack. 您可以尝试使用groupby,unstack和stack的组合。 I'm not sure what your column names are so I assumed some liberties.
我不确定您的列名是什么,所以我假设有一些自由。 Here is my work:
这是我的工作:
# Make DataFrame
df = pd.DataFrame({'country' : ['Sri Lanka DSR']*2 + ['St Helena']*2 + ['St Kitt']*2,
'table' : ['table1', 'table2']*3,
'ID' : [194, 185, 195, 186, 196, 187],
'sumvalue' : [139571409, 1803152, 24, 0, 0, 0]})
# Groupby 'country', 'table', 'ID';
# unstack 'table', 'ID' and take reverse difference on the columns;
# stack 'table', 'ID' and rename 'sumvalue' to 'diff'
df.groupby(['country',
'table',
'ID']).agg({'sumvalue' : 'sum'}).unstack(['table',
'ID']).\
diff(periods = -1,
axis = 1).stack(['table',
'ID']).rename(columns = {'sumvalue' : 'diff'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.