Below is my sample data:
Customer Document Date Clearing Date Invoice_Amount
0 A 09/13/2016 11/04/2016 2,007,324
1 A 04/18/2016 07/11/2016 631,714
2 A 09/13/2016 09/16/2016 4,000,000
3 A 07/11/2017 09/23/2017 5,000,000
4 A 05/03/2016 06/17/2016 2,000,000
--- --- --- --- ---
1158 H 04/21/2017 06/28/2017 3,000,000
1159 H 04/25/2017 05/19/2017 1,000,000
1160 H 11/03/2017 12/11/2017 4,500,000
1161 H 03/15/2018 05/27/2018 3,500,000
1162 H 02/21/2018 05/03/2018 1,500,000
I want to create a new variable(add a new column after Invoice_Amount) No_Paid , which calculate "number of paid invoices prior to the Document date of a new invoice of a customer."
The expected output is as follows...
Customer Document Date Clearing Date Invoice_Amount No_Paid*
0 A 09/13/2016 11/04/2016 2,007,324 8
1 A 04/18/2016 07/11/2016 631,714 1
2 A 09/13/2016 09/16/2016 4,000,000 8
3 A 07/11/2017 09/23/2017 5,000,000 6
4 A 05/03/2016 06/17/2016 2,000,000 1
--- --- --- --- --- ---
1158 H 04/21/2017 06/28/2017 3,000,000 5
1159 H 04/25/2017 05/19/2017 1,000,000 3
1160 H 11/03/2017 12/11/2017 4,500,000 7
1161 H 03/15/2018 05/27/2018 3,500,000 37
1162 H 02/21/2018 05/03/2018 1,500,000 37
Currently, I use for loop to achieve the expected output
import pandas as pd
df = pd.read_csv('E:\data.csv')
df['Document Date'] = pd.to_datetime(df['Document Date'],format="%m/%d/%Y")
df['Clearing Date'] = pd.to_datetime(df['Clearing Date'],format="%m/%d/%Y")
df["No_Paid"] = ""
for i in df.index:
Vendor= df.loc[i,"Vendor"]
Doc_Date= df.loc[i,"Document Date"]
Six_Month = Doc_Date - pd.Timedelta(days=180)
df.loc[i,"No_Paid"] = df.loc[(df["Vendor"] == Vendor) & (df["Clearing Date"] < Doc_Date) & (df["Document Date"] >= Six_Month),"Invoice_Amount"].count()
In real case, i have over 100,000 invoices data, which take a longer time I try to use df.apply ...But can't reach the same output...
Going by your example:
import pandas as pd
# read in csv (save as csv or read in using pd.read_excel)
df = pd.read_csv('file.csv')
# to datetime just in case
df['Doc_Date'] = pd.to_datetime(df['Doc_Date'])
df['Exp_Date'] = pd.to_datetime(df['Exp_Date'])
df['Overdue'] = df['Doc_Date'] - df['Exp_Date']
# 180 days for 6 months
df['6M_Age'] = df['Doc_Date'] - pd.Timedelta(days=180)
# Hard to tell what the line in the middle of the data means
# you can group by two columns if you need too
df['Sum_of_paid'] = df.groupby('ID').cumsum()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.