简体   繁体   中英

calculating percentile values for each columns group by another column values - Pandas dataframe

I have a dataframe that looks like below -

   Year  Salary  Amount
0  2019    1200      53
1  2020    3443     455
2  2021    6777     123
3  2019    5466     313
4  2020    4656     545
5  2021    4565     775
6  2019    4654     567
7  2020    7867     657
8  2021    6766     567

Python script to get the dataframe below -

import pandas as pd
import numpy as np

d = pd.DataFrame({
    'Year': [
        2019,
        2020,
        2021,
    ] * 3,
    'Salary': [
        1200,
        3443,
        6777,
        5466,
        4656,
        4565,
        4654,
        7867,
        6766
    ],
    'Amount': [
        53,
        455,
        123,
        313,
        545,
        775,
        567,
        657,
        567
    ]
})

I want to calculate certain percentile values for all the columns grouped by 'Year'. Desired output should look like -

期望的输出

I am running below python script to perform the calculations to calculate certain percentile values-

df_percentile = pd.DataFrame()
p_list = [0.05, 0.10, 0.25, 0.50, 0.75, 0.95, 0.99]
c_list = []
p_values = []
for cols in d.columns[1:]:
    for p in p_list:
        c_list.append(cols + '_' + str(p))
        p_values.append(np.percentile(d[cols], p))
print(len(c_list), len(p_values))
df_percentile['Name'] = pd.Series(c_list)
df_percentile['Value'] = pd.Series(p_values)
print(df_percentile)

Output -

           Name      Value
0   Salary_0.05  1208.9720
1    Salary_0.1  1217.9440
2   Salary_0.25  1244.8600
3    Salary_0.5  1289.7200
4   Salary_0.75  1334.5800
5   Salary_0.95  1370.4680
6   Salary_0.99  1377.6456
7   Amount_0.05    53.2800
8    Amount_0.1    53.5600
9   Amount_0.25    54.4000
10   Amount_0.5    55.8000
11  Amount_0.75    57.2000
12  Amount_0.95    58.3200
13  Amount_0.99    58.5440

How can I get the output in the required format without having to do extra data manipulation/formatting or in fewer lines of code?

You can try pivot followed by quantile :

(df.pivot(columns='Year')
   .quantile([0.01,0.05,0.75, 0.95, 0.99])
   .stack('Year')
)

Output:

            Salary  Amount
     Year                 
0.01 2019  1269.08   58.20
     2020  3467.26  456.80
     2021  4609.02  131.88
0.05 2019  1545.40   79.00
     2020  3564.30  464.00
     2021  4785.10  167.40
0.75 2019  5060.00  440.00
     2020  6261.50  601.00
     2021  6771.50  671.00
0.95 2019  5384.80  541.60
     2020  7545.90  645.80
     2021  6775.90  754.20
0.99 2019  5449.76  561.92
     2020  7802.78  654.76
     2021  6776.78  770.84

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM