简体   繁体   中英

Calculate formulae based on results of Pandas grouping and add to final resultset

Morning All,

Dataframe:

df = {'Client': ['A', 'A', 'A',
                 'B', 'B', 'B', 'B','B','B','B','B',
                 'C',
                 'D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
           'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ',
           'No RFQ',
           'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)

   Client            Result
0       A           Covered
1       A   Customer Reject
2       A  Customer Timeout
3       B     Dealer Reject
4       B    Dealer Timeout
5       B              Done
6       B      Tied Covered
7       B         Tied Done
8       B  Tied Traded Away
9       B       Traded Away
10      B            No RFQ
11      C            No RFQ
12      D           Covered
13      D   Customer Reject
14      D  Customer Timeout
15      D     Dealer Reject
16      D    Dealer Timeout
17      D              Done
18      D      Tied Covered
19      D         Tied Done
20      D  Tied Traded Away
21      D       Traded Away
22      D            No RFQ

Add zero values and append Totals within each grouping (courtesy @MhDG7)

groups = pd.melt((
    df
    .assign(count=1)
    .groupby(['Client','Result'],as_index=False)
    .sum()
    .pivot(index='Client', columns='Result', values='count')
    .fillna(0)
    .reset_index()
    )
    ,id_vars=['Client']).sort_values(['Client','value'])

# print(groups)
print("")

for i,rows in groups.groupby('Client').sum().iterrows():
    print(groups[lambda x: x['Client']==i])
    print('Total\t\t\t    ',rows['value'])
    print('\n')

   Client            Result  value
12      A     Dealer Reject    0.0
16      A    Dealer Timeout    0.0
20      A              Done    0.0
24      A            No RFQ    0.0
28      A      Tied Covered    0.0
32      A         Tied Done    0.0
36      A  Tied Traded Away    0.0
40      A       Traded Away    0.0
0       A           Covered    1.0
4       A   Customer Reject    1.0
8       A  Customer Timeout    1.0
Total                3.0


   Client            Result  value
1       B           Covered    0.0
5       B   Customer Reject    0.0
9       B  Customer Timeout    0.0
13      B     Dealer Reject    1.0
17      B    Dealer Timeout    1.0
21      B              Done    1.0
25      B            No RFQ    1.0
29      B      Tied Covered    1.0
33      B         Tied Done    1.0
37      B  Tied Traded Away    1.0
41      B       Traded Away    1.0
Total                8.0


   Client            Result  value
2       C           Covered    0.0
6       C   Customer Reject    0.0
10      C  Customer Timeout    0.0
14      C     Dealer Reject    0.0
18      C    Dealer Timeout    0.0
22      C              Done    0.0
30      C      Tied Covered    0.0
34      C         Tied Done    0.0
38      C  Tied Traded Away    0.0
42      C       Traded Away    0.0
26      C            No RFQ    1.0
Total                1.0


   Client            Result  value
3       D           Covered    1.0
7       D   Customer Reject    1.0
11      D  Customer Timeout    1.0
15      D     Dealer Reject    1.0
19      D    Dealer Timeout    1.0
23      D              Done    1.0
27      D            No RFQ    1.0
31      D      Tied Covered    1.0
35      D         Tied Done    1.0
39      D  Tied Traded Away    1.0
43      D       Traded Away    1.0
Total                11.0

Desired output:

I would like to add two formula which use the results from each returned grouping. So there would be a Total row (Sum of all Results per client) but also two formulae ( Pct_Calc_1 and Pct_Calc_2 ) based on the summed grouped values returned from each individual Client:

Pct_Calc_1 = (Total_Result_Per_Client - Total_Dealer_Reject_Per_Client - Total_Dealer_Timeout_Per_Client) / (Total_Per_Client - Total_Customer_Reject_Per_Client)

Pct_Calc_2 = (Total – (Covered + Tied Covered) – Customer Reject – Dealer Reject – Dealer Timeout – (Traded Away + Tied Traded Away)) / (Total – Dealer Reject – Dealer Timeout)

In the case of say Client B, the following output would be displayed:

   Client            Result  value
1       B           Covered    0.0
5       B   Customer Reject    0.0
9       B  Customer Timeout    0.0
13      B     Dealer Reject    1.0
17      B    Dealer Timeout    1.0
21      B              Done    1.0
25      B            No RFQ    1.0
29      B      Tied Covered    1.0
33      B         Tied Done    1.0
37      B  Tied Traded Away    1.0
41      B       Traded Away    1.0
Total                          8.0
Pct_Calc_1                     75.00%
Pct_Calc_2                     50.00%

where Pct_Calc_1 = (8 - 1 - 1) / (8 - 0) = 75.00%

Pct_Calc_2 = (8 - (0 + 1) - 0 - 1 - 1 - (1 + 1))) / (8 - 1 - 1) = 50.00%

In the case of Client D:

   Client            Result  value
3       D           Covered    1.0
7       D   Customer Reject    1.0
11      D  Customer Timeout    1.0
15      D     Dealer Reject    1.0
19      D    Dealer Timeout    1.0
23      D              Done    1.0
27      D            No RFQ    1.0
31      D      Tied Covered    1.0
35      D         Tied Done    1.0
39      D  Tied Traded Away    1.0
43      D       Traded Away    1.0
Total                          11.0
Pct_Calc_1                     90.00%
Pct_Calc_2                     44.44%

Pct_Calc_1 = (11 - 1 - 1) / (11 - 1) = 90.00%

Pct_Calc_2 = (11 - (1 + 1) - 1 - 1 -1 - (1 + 1))) / (11 - 1 - 1) = 44.44%

If the original grouping needs to be changed to cater for this additional output then all good.

Let us do crosstab

s = pd.crosstab(df.Client,df.Result,margins=True).drop('All')
s['Cal1']=(s['All'] - s['Dealer Reject'] - s['Dealer Timeout'])/(s['All'] - s['Customer Reject'])
s = s.stack().reset_index(name='value')
s
   Client            Result  value
0       A           Covered   1.00
1       A   Customer Reject   1.00
2       A  Customer Timeout   1.00
3       A     Dealer Reject   0.00
4       A    Dealer Timeout   0.00
5       A              Done   0.00
6       A            No RFQ   0.00
7       A      Tied Covered   0.00
8       A         Tied Done   0.00
9       A  Tied Traded Away   0.00
10      A       Traded Away   0.00
11      A               All   3.00
12      A              Cal1   1.50
13      B           Covered   0.00
14      B   Customer Reject   0.00
15      B  Customer Timeout   0.00
16      B     Dealer Reject   1.00
17      B    Dealer Timeout   1.00
18      B              Done   1.00
19      B            No RFQ   1.00
20      B      Tied Covered   1.00
21      B         Tied Done   1.00
22      B  Tied Traded Away   1.00
23      B       Traded Away   1.00
24      B               All   8.00
25      B              Cal1   0.75
26      C           Covered   0.00
27      C   Customer Reject   0.00
28      C  Customer Timeout   0.00
29      C     Dealer Reject   0.00
30      C    Dealer Timeout   0.00
31      C              Done   0.00
32      C            No RFQ   1.00
33      C      Tied Covered   0.00
34      C         Tied Done   0.00
35      C  Tied Traded Away   0.00
36      C       Traded Away   0.00
37      C               All   1.00
38      C              Cal1   1.00
39      D           Covered   1.00
40      D   Customer Reject   1.00
41      D  Customer Timeout   1.00
42      D     Dealer Reject   1.00
43      D    Dealer Timeout   1.00
44      D              Done   1.00
45      D            No RFQ   1.00
46      D      Tied Covered   1.00
47      D         Tied Done   1.00
48      D  Tied Traded Away   1.00
49      D       Traded Away   1.00
50      D               All  11.00
51      D              Cal1   0.90

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM