Is there a way to calculate ratio between a crosstab dataframe with another dataframe in pandas?

Question

Summary - the end goal is to calculate the percentage based on the output from a crosstab function in Pandas with another dataframe at a shared index.

What I've tried - tried to split the original crosstab dataframe as numerator and div another dataframe, but it doesn't seem to work that way because the result was all nan

Code

import pandas as pd
import numpy as np 

df1 = pd.DataFrame({"Vntg": ["2020-01","2020-02","2020-03"],"Funded":[1000,2000,4000]}) # This is the df we want to use as denominator
df2 = pd.DataFrame({"Vntg": ["2020-01","2020-01","2020-01","2020-02","2020-02","2020-03"],
                    "Funded":[1000,1000,1000,2000,2000,4000],
                    "Payment":[10,20,20,30,15,30],
                    "Timing":[0,1,2,0,1,0]})
ct_df = pd.crosstab(df2["Vntg"], df2["Timing"], values=df2["Payment"], aggfunc="sum", margins=False)
ct_df = ct_df.cumsum(axis=1) # This is the crosstab df we want to use as numerator on a cumulative basis

Starting from the cumsum to accumulate the payments, is there a way to convert/replace the dollar value by funded amount in df1 as a percentage? Thanks in advance and appreciate all the help.

I've also looked at the thread below and it doesn't seem to solve my issue: Customized normalization of pd.crosstab()

Edit:

So I think some folks are confused about the ask. To clarify, the final result would be taking 10 from df2 at timing 0 and divided by funded amount, which is 1000 from df1 for vintage 2020-01. For the subsequent timing at 1, it would just be (10+30) from df2 and divided the same funded amount from df1 for the same vintage because it doesn't change in nature. The result would be populated by the same logic for other vintages.

Answer 1

If I understand the question, you want to sum up Payment values within the same Vntg value, and divide that by the Funded field of the other dataframe with a matching Vntg field.

You can do that by grouping on Vntg, summing, and dividing by the other dataframe:

df2.groupby('Vntg')['Payment'].sum() / df1.set_index('Vntg')['Funded'] * 100

Answer 2

The approach below doesn't use crosstab , but should give the same answer (IIUC):

(
    df2.sort_values(["Vntg", "Timing"])
    .assign(cum_paymt=lambda df: df.groupby("Timing")["Payment"].transform("cumsum"))
    .assign(cum_share=lambda df: df["cum_paymt"] / df["Funded"])
    .pivot(index="Vntg", columns="Timing", values="cum_share")
)

Answer 3

If you give the expected output in the form of dataframe, it will be easier for everyone:

>>> ct_df.cumsum(axis=1).div(df1.set_index('Vntg')['Funded'], axis=0).mul(100)
Timing      0     1    2
Vntg                    
2020-01  1.00  3.00  5.0
2020-02  1.50  2.25  NaN
2020-03  0.75   NaN  NaN

Is there a way to calculate ratio between a crosstab dataframe with another dataframe in pandas?

Question

2 answers

solution1
0 2021-12-13 22:05:53

solution2
0 2021-12-15 06:10:02

solution3
0 ACCPTED 2021-12-15 21:24:24

Is there a way to calculate ratio between a crosstab dataframe with another dataframe in pandas?

Question

2 answers

solution1 0 2021-12-13 22:05:53

solution2 0 2021-12-15 06:10:02

solution3 0 ACCPTED 2021-12-15 21:24:24

solution1
0 2021-12-13 22:05:53

solution2
0 2021-12-15 06:10:02

solution3
0 ACCPTED 2021-12-15 21:24:24