I have a dataframe(df) as following:
Customer Date_Begin Date_End Product
0 1 2017-01-01 2020-01-01 UsagePeriod
1 1 2017-01-02 2018-01-02 Token
2 1 2017-01-06 2018-01-06 Token
3 1 2018-12-01 2019-12-01 Token
4 1 2019-06-01 2020-06-01 Token
5 1 2019-12-21 2022-12-21 UsagePeriod
6 1 2020-01-31 2021-01-31 Token
7 1 2021-06-30 2022-06-30 Token
8 1 2021-09-30 2022-09-30 Token
9 2 2019-06-01 2022-06-01 UsagePeriod
10 2 2019-06-01 2020-06-01 Token
11 3 2019-06-01 2022-06-01 UsagePeriod
12 3 2019-06-01 2020-06-01 Token
13 3 2020-06-01 2021-06-01 Token
14 4 2017-01-01 2020-01-01 UsagePeriod
15 4 2017-02-15 2018-02-15 Token
16 4 2019-12-15 2020-12-15 Token
There are some rules to be applied to reach the desired output dataframe:
Customer Date_Begin Date_End Product Expiry
0 1 2017-01-01 2020-01-01 UsagePeriod 2020-01-01
1 1 2017-01-02 2018-01-02 Token 2017-01-06
2 1 2017-01-06 2018-01-06 Token 2018-01-06
3 1 2018-12-01 2019-12-01 Token 2019-06-01
4 1 2019-06-01 2020-06-01 Token 2019-12-21
5 1 2019-12-21 2022-12-21 UsagePeriod 2022-12-21
6 1 2020-01-31 2021-01-31 Token 2021-01-31
7 1 2021-06-30 2022-06-30 Token 2021-09-30
8 1 2021-09-30 2022-09-30 Token 2022-09-30
9 2 2019-06-01 2022-06-01 UsagePeriod 2022-06-01
10 2 2019-06-01 2020-06-01 Token 2020-06-01
11 3 2019-06-01 2022-06-01 UsagePeriod 2022-06-01
12 3 2019-06-01 2020-06-01 Token 2020-06-01
13 3 2020-06-01 2021-06-01 Token 2021-06-01
14 4 2017-01-01 2020-01-01 UsagePeriod 2020-01-01
15 4 2017-02-15 2018-02-15 Token 2018-02-15
16 4 2019-12-15 2020-12-15 Token 2020-01-01
The code that I currently have is:
def myFunc(df):
for i in range(0,len(df)-1): #loop through each row of dataframe
if df.Product.loc[i] == "UsagePeriod": #When product equals "UsagePeriod"
df['expiricy'].loc[i] = df['Date_Begin'].loc[i] + pd.offsets.DateOffset(years=3); #update new column
elif df.Product.loc[i] == "Token": #When product equals "Token"
if df.Customer.loc[i] == df.Customer.loc[i+1]: #customer changed?
if abs(df.Date_Begin.loc[i+1]-df.Date_Begin.loc[i])>= pd.Timedelta(364, 'd'): #has year increased by more than 1(year)? product = product
df['expiricy'].loc[i] = df['Date_Begin'].loc[i] + pd.offsets.DateOffset(years=1); #update new column
elif abs(df.Date_Begin.loc[i+1]-df.Date_Begin.loc[i]) < pd.Timedelta(364, 'd'): #When less than a year?
df['expiricy'].loc[i] = df['Date_Begin'].loc[i+1];
I am struggling with defining the tokens end period in the UsagePeriod. Any help would be appreciated.
First convert the date columns to datetime objects. You will need this for valid comparisons.
df["Date_Begin"] = pd.to_datetime(df["Date_Begin"])
df["Date_End"] = pd.to_datetime(df["Date_End"])
All the rows with Token as product are shifted backwards in groups of customers:
df["Expiry"] = df.loc[df["Product"].eq("Token"), "Date_Begin"].groupby(df["Customer"]).shift(-1)
The results:
Customer Date_Begin Date_End Product Expiry
0 1 2017-01-01 2020-01-01 UsagePeriod NaT
1 1 2017-01-02 2018-01-02 Token 2017-01-06
2 1 2017-01-06 2018-01-06 Token 2018-12-01
3 1 2018-12-01 2019-12-01 Token 2019-06-01
4 1 2019-06-01 2020-06-01 Token 2020-01-31
5 1 2019-12-21 2022-12-21 UsagePeriod NaT
6 1 2020-01-31 2021-01-31 Token 2021-06-30
7 1 2021-06-30 2022-06-30 Token 2021-09-30
8 1 2021-09-30 2022-09-30 Token NaT
9 2 2019-06-01 2022-06-01 UsagePeriod NaT
10 2 2019-06-01 2020-06-01 Token NaT
11 3 2019-06-01 2022-06-01 UsagePeriod NaT
12 3 2019-06-01 2020-06-01 Token 2020-06-01
13 3 2020-06-01 2021-06-01 Token NaT
14 4 2017-01-01 2020-01-01 UsagePeriod NaT
15 4 2017-02-15 2018-02-15 Token 2019-12-15
16 4 2019-12-15 2020-12-15 Token NaT
Then we compare Date_End with the current Expiry. if Date_End is sooner we use date end else Expiry.
df["Expiry"] = df["Expiry"].where(df["Expiry"].lt(df["Date_End"]), df["Date_End"])
This also sets the values where the product is Usage Period to Date_End values.
Customer Date_Begin Date_End Product Expiry
0 1 2017-01-01 2020-01-01 UsagePeriod 2020-01-01
1 1 2017-01-02 2018-01-02 Token 2017-01-06
2 1 2017-01-06 2018-01-06 Token 2018-01-06
3 1 2018-12-01 2019-12-01 Token 2019-06-01
4 1 2019-06-01 2020-06-01 Token 2020-01-31
5 1 2019-12-21 2022-12-21 UsagePeriod 2022-12-21
6 1 2020-01-31 2021-01-31 Token 2021-01-31
7 1 2021-06-30 2022-06-30 Token 2021-09-30
8 1 2021-09-30 2022-09-30 Token 2022-09-30
9 2 2019-06-01 2022-06-01 UsagePeriod 2022-06-01
10 2 2019-06-01 2020-06-01 Token 2020-06-01
11 3 2019-06-01 2022-06-01 UsagePeriod 2022-06-01
12 3 2019-06-01 2020-06-01 Token 2020-06-01
13 3 2020-06-01 2021-06-01 Token 2021-06-01
14 4 2017-01-01 2020-01-01 UsagePeriod 2020-01-01
15 4 2017-02-15 2018-02-15 Token 2018-02-15
16 4 2019-12-15 2020-12-15 Token 2020-12-15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.