简体   繁体   中英

Is there a way to calculate ratio between a crosstab dataframe with another dataframe in pandas?

Summary - the end goal is to calculate the percentage based on the output from a crosstab function in Pandas with another dataframe at a shared index.

What I've tried - tried to split the original crosstab dataframe as numerator and div another dataframe, but it doesn't seem to work that way because the result was all nan

Code

import pandas as pd
import numpy as np 

df1 = pd.DataFrame({"Vntg": ["2020-01","2020-02","2020-03"],"Funded":[1000,2000,4000]}) # This is the df we want to use as denominator
df2 = pd.DataFrame({"Vntg": ["2020-01","2020-01","2020-01","2020-02","2020-02","2020-03"],
                    "Funded":[1000,1000,1000,2000,2000,4000],
                    "Payment":[10,20,20,30,15,30],
                    "Timing":[0,1,2,0,1,0]})
ct_df = pd.crosstab(df2["Vntg"], df2["Timing"], values=df2["Payment"], aggfunc="sum", margins=False)
ct_df = ct_df.cumsum(axis=1) # This is the crosstab df we want to use as numerator on a cumulative basis

Starting from the cumsum to accumulate the payments, is there a way to convert/replace the dollar value by funded amount in df1 as a percentage? Thanks in advance and appreciate all the help.

I've also looked at the thread below and it doesn't seem to solve my issue: Customized normalization of pd.crosstab()

Edit:

So I think some folks are confused about the ask. To clarify, the final result would be taking 10 from df2 at timing 0 and divided by funded amount, which is 1000 from df1 for vintage 2020-01. For the subsequent timing at 1, it would just be (10+30) from df2 and divided the same funded amount from df1 for the same vintage because it doesn't change in nature. The result would be populated by the same logic for other vintages.

If I understand the question, you want to sum up Payment values within the same Vntg value, and divide that by the Funded field of the other dataframe with a matching Vntg field.

You can do that by grouping on Vntg, summing, and dividing by the other dataframe:

df2.groupby('Vntg')['Payment'].sum() / df1.set_index('Vntg')['Funded'] * 100

The approach below doesn't use crosstab , but should give the same answer (IIUC):

(
    df2.sort_values(["Vntg", "Timing"])
    .assign(cum_paymt=lambda df: df.groupby("Timing")["Payment"].transform("cumsum"))
    .assign(cum_share=lambda df: df["cum_paymt"] / df["Funded"])
    .pivot(index="Vntg", columns="Timing", values="cum_share")
)

If you give the expected output in the form of dataframe, it will be easier for everyone:

>>> ct_df.cumsum(axis=1).div(df1.set_index('Vntg')['Funded'], axis=0).mul(100)
Timing      0     1    2
Vntg                    
2020-01  1.00  3.00  5.0
2020-02  1.50  2.25  NaN
2020-03  0.75   NaN  NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM