简体   繁体   中英

Cumulative percentage of pandas data frame

I have a data frame like below with a specific ID (code) and areas and length by a specific distance (Dist_km)

     code  Dist_km    Shape_Leng    Shape_Area
0   M0017      5.0  57516.601608  5.076465e+07   
1   M0017     10.0  94037.663673  4.638184e+07   
2   M0017     15.0  39106.310470  1.426327e+07   
3   M0017     20.0    138.038115  6.464380e+02   
4   M0017     30.0  12158.395200  4.102351e+06   
5   M0073      5.0  51922.847698  3.375080e+07   
6   M0073     10.0  75543.660382  5.966612e+07   
7   M0073     15.0  55277.027428  3.423961e+07   
8   M0073     20.0  26945.782055  2.584022e+07   
9   M0073     25.0   4052.670711  6.904536e+05   
10  M0333      5.0  30090.687597  5.468791e+07   
11  M0333     10.0  55946.815385  5.768929e+07   
12  M0333     15.0  65026.329732  4.008600e+07   
13  M0333     20.0  59014.487216  2.994337e+07   
14  M0333     25.0  17423.635441  6.358991e+06  

Using:

mrb['cum_area_sqm'] = mrb.groupby(['code'])['Shape_Area'].apply(lambda x: x.cumsum())
mrb['cum_area_ha'] = mrb['cum_area_sqm']/10000
mrb_cumsum = mrb.groupby(['code','Dist_km']).agg({'cum_area_ha': 'sum'})

I have managed to convert the data frame to the below

               cum_area_ha
code  Dist_km              
M0017 5.0       5076.464548
      10.0      9714.648238
      15.0     11140.974881
      20.0     11141.039525
      30.0     11551.274623
M0073 5.0       3375.080465
      10.0      9341.692680
      15.0     12765.654064
      20.0     15349.676332
      25.0     15418.721691
M0333 5.0       5468.790981
      10.0     11237.720454
      15.0     15246.320869
      20.0     18240.658255
      25.0     18876.557351 

However, I would like to now get a cumulative percentages of these areas for each code by Dist_km up to a 100 percent.

So, for example for M0017, I would like to have something like the below.

               cum_area_ha   cum_area_pc
code  Dist_km              
M0017 5.0       5076.464548    43.49
      10.0      9714.648238    84.10
      15.0     11140.974881    96.45
      20.0     11141.039525    96.45
      30.0     11551.274623   100.00

You can divide each element by the last cum_area_ha in the same code group.

mrb_cumsum.div(mrb_cumsum.groupby(level=0).last())
Out[97]: 
               cum_area_ha
code  Dist_km             
M0017 5.0         0.439472
      10.0        0.841002
      15.0        0.964480
      20.0        0.964486
      30.0        1.000000
M0073 5.0         0.218895
      10.0        0.605867
      15.0        0.827932
      20.0        0.995522
      25.0        1.000000
M0333 5.0         0.289713
      10.0        0.595327
      15.0        0.807685
      20.0        0.966313
      25.0        1.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM