[英]Row-wise operation in Pandas data frame
I have a World Indicator dataset that has this format 我有一个具有这种格式的世界指标数据集
country year indicatorName value
USA 1970 Agricultural Land ...
USA 1970 Crop production ...
...
USA 2000 Agricultural Land ...
USA 2000 Crop production ...
...
Mexico 1970 Agricultural Land ...
Mexico 1970 Crop production ...
...
Mexico 2000 Agricultural Land ...
Mexico 2000 Crop production ...
There are indicators here that I did not include, but these two are what I'm interested in. I want to divide the corresponding value
of Crop production
to Agricultural Land
per country
per year
. 这里有指标,我没有包括,但是这两个都是我很感兴趣,我想对应的划分
value
的Crop production
对Agricultural Land
每个country
每year
。 Let's name the result crop_prod_density
. 让我们将结果命名为
crop_prod_density
。
I do not know how to proceed from 我不知道如何着手
df.groupby(['country', 'year'])
How to do it from here to result the following outputs: 如何从此处执行操作以产生以下输出:
country year indicatorName value USA 1970 Agricultural Land ... USA 1970 Crop production ... USA 1970 crop_prod_density ...
country year indicatorName value crop_prod_density USA 1970 Agricultural Land ... us_value_1970 USA 1970 Crop production ... us_value_1970 ... Mexico 2000 Agricultural Land ... mx_value_2000 Mexico 2000 Crop production ... mx_value_2000
country year crop_prod_density USA 1970 us_value_1970 ... USA 2000 us_value_2000 ... Mexico 1970 mx_value_1970 ... Mexico 2000 mx_value_2000
You can first reshape by set_index
with unstack
and then divide by div
: 您可以通过先重塑
set_index
与unstack
,然后通过分div
:
print (df)
country year indicatorName value
0 USA 1970 Agricultural Land 10
1 USA 1970 Crop production 2
2 USA 2000 Agricultural Land 10
3 USA 2000 Crop production 3
4 Mexico 1970 Agricultural Land 10
5 Mexico 1970 Crop production 5
6 Mexico 2000 Agricultural Land 10
7 Mexico 2000 Crop production 4
df = (df.set_index(['country','year','indicatorName'])['value']
.unstack()
.assign(crop_prod_density=lambda x: x['Crop production'].div(x['Agricultural Land'])))
print (df)
indicatorName Agricultural Land Crop production crop_prod_density
country year
Mexico 1970 10 5 0.5
2000 10 4 0.4
USA 1970 10 2 0.2
2000 10 3 0.3
Then reshape back by stack
: 然后通过
stack
重塑形状:
df1 = df.stack().reset_index(name='value')
print (df1)
country year indicatorName value
0 Mexico 1970 Agricultural Land 10.0
1 Mexico 1970 Crop production 5.0
2 Mexico 1970 crop_prod_density 0.5
3 Mexico 2000 Agricultural Land 10.0
4 Mexico 2000 Crop production 4.0
5 Mexico 2000 crop_prod_density 0.4
6 USA 1970 Agricultural Land 10.0
7 USA 1970 Crop production 2.0
8 USA 1970 crop_prod_density 0.2
9 USA 2000 Agricultural Land 10.0
10 USA 2000 Crop production 3.0
11 USA 2000 crop_prod_density 0.3
For new column to original append to index new column, but last is necessary change order of columns by reindex
: 对于将新列添加到原始列,将新列添加到索引新列,但是最后必须通过
reindex
更改列的顺序:
df2 =(df.set_index(['crop_prod_density'], append=True)
.stack()
.reset_index(name='value')
.reindex(columns=['country','year','indicatorName','value','crop_prod_density']))
print (df2)
country year indicatorName value crop_prod_density
0 Mexico 1970 Agricultural Land 10 0.5
1 Mexico 1970 Crop production 5 0.5
2 Mexico 2000 Agricultural Land 10 0.4
3 Mexico 2000 Crop production 4 0.4
4 USA 1970 Agricultural Land 10 0.2
5 USA 1970 Crop production 2 0.2
6 USA 2000 Agricultural Land 10 0.3
7 USA 2000 Crop production 3 0.3
And last remove unnecessary columns and create columns from MultiIndex
: 最后删除不必要的列并从
MultiIndex
创建列:
df3 = (df.drop(['Crop production','Agricultural Land'], axis=1)
.reset_index()
.rename_axis(None, 1))
print (df3)
country year crop_prod_density
0 Mexico 1970 0.5
1 Mexico 2000 0.4
2 USA 1970 0.2
3 USA 2000 0.3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.