Summary - the end goal is to calculate the percentage based on the output from a crosstab function in Pandas with another dataframe at a shared index.
What I've tried - tried to split the original crosstab dataframe as numerator and div
another dataframe, but it doesn't seem to work that way because the result was all nan
Code
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"Vntg": ["2020-01","2020-02","2020-03"],"Funded":[1000,2000,4000]}) # This is the df we want to use as denominator
df2 = pd.DataFrame({"Vntg": ["2020-01","2020-01","2020-01","2020-02","2020-02","2020-03"],
"Funded":[1000,1000,1000,2000,2000,4000],
"Payment":[10,20,20,30,15,30],
"Timing":[0,1,2,0,1,0]})
ct_df = pd.crosstab(df2["Vntg"], df2["Timing"], values=df2["Payment"], aggfunc="sum", margins=False)
ct_df = ct_df.cumsum(axis=1) # This is the crosstab df we want to use as numerator on a cumulative basis
Starting from the cumsum
to accumulate the payments, is there a way to convert/replace the dollar value by funded amount in df1
as a percentage? Thanks in advance and appreciate all the help.
I've also looked at the thread below and it doesn't seem to solve my issue: Customized normalization of pd.crosstab()
Edit:
So I think some folks are confused about the ask. To clarify, the final result would be taking 10 from df2
at timing 0 and divided by funded amount, which is 1000 from df1
for vintage 2020-01. For the subsequent timing at 1, it would just be (10+30) from df2
and divided the same funded amount from df1
for the same vintage because it doesn't change in nature. The result would be populated by the same logic for other vintages.
If I understand the question, you want to sum up Payment values within the same Vntg value, and divide that by the Funded field of the other dataframe with a matching Vntg field.
You can do that by grouping on Vntg, summing, and dividing by the other dataframe:
df2.groupby('Vntg')['Payment'].sum() / df1.set_index('Vntg')['Funded'] * 100
The approach below doesn't use crosstab
, but should give the same answer (IIUC):
(
df2.sort_values(["Vntg", "Timing"])
.assign(cum_paymt=lambda df: df.groupby("Timing")["Payment"].transform("cumsum"))
.assign(cum_share=lambda df: df["cum_paymt"] / df["Funded"])
.pivot(index="Vntg", columns="Timing", values="cum_share")
)
If you give the expected output in the form of dataframe, it will be easier for everyone:
>>> ct_df.cumsum(axis=1).div(df1.set_index('Vntg')['Funded'], axis=0).mul(100)
Timing 0 1 2
Vntg
2020-01 1.00 3.00 5.0
2020-02 1.50 2.25 NaN
2020-03 0.75 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.