简体   繁体   English

熊猫数据框中的按行操作

[英]Row-wise operation in Pandas data frame

I have a World Indicator dataset that has this format 我有一个具有这种格式的世界指标数据集

country     year    indicatorName       value
USA         1970    Agricultural Land   ...
USA         1970    Crop production     ...
...
USA         2000    Agricultural Land   ...
USA         2000    Crop production     ...
...
Mexico      1970    Agricultural Land   ...
Mexico      1970    Crop production     ...
...
Mexico      2000    Agricultural Land   ...
Mexico      2000    Crop production     ...

There are indicators here that I did not include, but these two are what I'm interested in. I want to divide the corresponding value of Crop production to Agricultural Land per country per year . 这里有指标,我没有包括,但是这两个都是我很感兴趣,我想对应的划分valueCrop productionAgricultural Land每个countryyear Let's name the result crop_prod_density . 让我们将结果命名为crop_prod_density

I do not know how to proceed from 我不知道如何着手

df.groupby(['country', 'year'])

How to do it from here to result the following outputs: 如何从此处执行操作以产生以下输出:

  1. Add new row indicator 添加新行指示器

country year indicatorName value USA 1970 Agricultural Land ... USA 1970 Crop production ... USA 1970 crop_prod_density ...

  1. Add new column with same values for all rows for grouped (country, year) 为分组(国家/地区,年份)的所有行添加具有相同值的新列

country year indicatorName value crop_prod_density USA 1970 Agricultural Land ... us_value_1970 USA 1970 Crop production ... us_value_1970 ... Mexico 2000 Agricultural Land ... mx_value_2000 Mexico 2000 Crop production ... mx_value_2000

  1. New dataframe with only this column for values 仅具有此列的新数据框

country year crop_prod_density USA 1970 us_value_1970 ... USA 2000 us_value_2000 ... Mexico 1970 mx_value_1970 ... Mexico 2000 mx_value_2000

You can first reshape by set_index with unstack and then divide by div : 您可以通过先重塑set_indexunstack ,然后通过分div

print (df)
  country  year      indicatorName  value
0     USA  1970  Agricultural Land     10
1     USA  1970    Crop production      2
2     USA  2000  Agricultural Land     10
3     USA  2000    Crop production      3
4  Mexico  1970  Agricultural Land     10
5  Mexico  1970    Crop production      5
6  Mexico  2000  Agricultural Land     10
7  Mexico  2000    Crop production      4  

df = (df.set_index(['country','year','indicatorName'])['value']
       .unstack()
       .assign(crop_prod_density=lambda x: x['Crop production'].div(x['Agricultural Land'])))
print (df)
indicatorName  Agricultural Land  Crop production  crop_prod_density
country year                                                        
Mexico  1970                  10                5                0.5
        2000                  10                4                0.4
USA     1970                  10                2                0.2
        2000                  10                3                0.3

Then reshape back by stack : 然后通过stack重塑形状:

df1 = df.stack().reset_index(name='value')
print (df1)
   country  year      indicatorName  value
0   Mexico  1970  Agricultural Land   10.0
1   Mexico  1970    Crop production    5.0
2   Mexico  1970  crop_prod_density    0.5
3   Mexico  2000  Agricultural Land   10.0
4   Mexico  2000    Crop production    4.0
5   Mexico  2000  crop_prod_density    0.4
6      USA  1970  Agricultural Land   10.0
7      USA  1970    Crop production    2.0
8      USA  1970  crop_prod_density    0.2
9      USA  2000  Agricultural Land   10.0
10     USA  2000    Crop production    3.0
11     USA  2000  crop_prod_density    0.3

For new column to original append to index new column, but last is necessary change order of columns by reindex : 对于将新列添加到原始列,将新列添加到索引新列,但是最后必须通过reindex更改列的顺序:

df2 =(df.set_index(['crop_prod_density'], append=True)
        .stack()
        .reset_index(name='value')
        .reindex(columns=['country','year','indicatorName','value','crop_prod_density']))
print (df2)
  country  year      indicatorName  value  crop_prod_density
0  Mexico  1970  Agricultural Land     10                0.5
1  Mexico  1970    Crop production      5                0.5
2  Mexico  2000  Agricultural Land     10                0.4
3  Mexico  2000    Crop production      4                0.4
4     USA  1970  Agricultural Land     10                0.2
5     USA  1970    Crop production      2                0.2
6     USA  2000  Agricultural Land     10                0.3
7     USA  2000    Crop production      3                0.3

And last remove unnecessary columns and create columns from MultiIndex : 最后删除不必要的列并从MultiIndex创建列:

df3 = (df.drop(['Crop production','Agricultural Land'], axis=1)
        .reset_index()
        .rename_axis(None, 1))
print (df3)
  country  year  crop_prod_density
0  Mexico  1970                0.5
1  Mexico  2000                0.4
2     USA  1970                0.2
3     USA  2000                0.3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM