简体   繁体   中英

Pandas Dataframe Creating a New Column Based on Other Columns

I have a dataframe(df) as following:

    Customer    Date_Begin  Date_End    Product
0   1   2017-01-01  2020-01-01  UsagePeriod
1   1   2017-01-02  2018-01-02  Token
2   1   2017-01-06  2018-01-06  Token
3   1   2018-12-01  2019-12-01  Token
4   1   2019-06-01  2020-06-01  Token
5   1   2019-12-21  2022-12-21  UsagePeriod
6   1   2020-01-31  2021-01-31  Token
7   1   2021-06-30  2022-06-30  Token
8   1   2021-09-30  2022-09-30  Token
9   2   2019-06-01  2022-06-01  UsagePeriod
10  2   2019-06-01  2020-06-01  Token
11  3   2019-06-01  2022-06-01  UsagePeriod
12  3   2019-06-01  2020-06-01  Token
13  3   2020-06-01  2021-06-01  Token
14  4   2017-01-01  2020-01-01  UsagePeriod
15  4   2017-02-15  2018-02-15  Token
16  4   2019-12-15  2020-12-15  Token

There are some rules to be applied to reach the desired output dataframe:

  1. Each customer has a UsagePeriod where (s)he can buy tokens. Normally the tokens are expired in a year(at most) and the UsagePeriod is always 3 years from the Date_Begin. Date_End column gives the expected dates of expiry.
  2. A customer would not buy a new token unless the one (s)he has expired. Meaning that (s)he has run out of that token if (s)he is buying a new one. Assuming that the customer bought the new token on the expiry of the previous one, I want to open an additional column containing all the expiry dates.
  3. The tokens can only be used in the defined UsagePeriod of the customer unless the customer buys a new Usage period. The output dataframe that I want to reach is:
    Customer    Date_Begin  Date_End    Product Expiry
0   1   2017-01-01  2020-01-01  UsagePeriod 2020-01-01
1   1   2017-01-02  2018-01-02  Token   2017-01-06
2   1   2017-01-06  2018-01-06  Token   2018-01-06
3   1   2018-12-01  2019-12-01  Token   2019-06-01
4   1   2019-06-01  2020-06-01  Token   2019-12-21
5   1   2019-12-21  2022-12-21  UsagePeriod 2022-12-21
6   1   2020-01-31  2021-01-31  Token   2021-01-31
7   1   2021-06-30  2022-06-30  Token   2021-09-30
8   1   2021-09-30  2022-09-30  Token   2022-09-30
9   2   2019-06-01  2022-06-01  UsagePeriod 2022-06-01
10  2   2019-06-01  2020-06-01  Token   2020-06-01
11  3   2019-06-01  2022-06-01  UsagePeriod 2022-06-01
12  3   2019-06-01  2020-06-01  Token   2020-06-01
13  3   2020-06-01  2021-06-01  Token   2021-06-01
14  4   2017-01-01  2020-01-01  UsagePeriod 2020-01-01
15  4   2017-02-15  2018-02-15  Token   2018-02-15
16  4   2019-12-15  2020-12-15  Token   2020-01-01
            

The code that I currently have is:

def myFunc(df):
                                                                         
for i in range(0,len(df)-1): #loop through each row of dataframe
    if df.Product.loc[i] == "UsagePeriod": #When product equals "UsagePeriod"
      df['expiricy'].loc[i] = df['Date_Begin'].loc[i]  + pd.offsets.DateOffset(years=3); #update new column
    elif df.Product.loc[i] == "Token": #When product equals "Token"
      if df.Customer.loc[i] == df.Customer.loc[i+1]: #customer changed?
        if abs(df.Date_Begin.loc[i+1]-df.Date_Begin.loc[i])>= pd.Timedelta(364, 'd'): #has year increased by more than 1(year)? product = product
          df['expiricy'].loc[i] = df['Date_Begin'].loc[i]  + pd.offsets.DateOffset(years=1); #update new column
        elif abs(df.Date_Begin.loc[i+1]-df.Date_Begin.loc[i]) < pd.Timedelta(364, 'd'): #When less than a year?
          df['expiricy'].loc[i] = df['Date_Begin'].loc[i+1];

I am struggling with defining the tokens end period in the UsagePeriod. Any help would be appreciated.

First convert the date columns to datetime objects. You will need this for valid comparisons.

df["Date_Begin"] = pd.to_datetime(df["Date_Begin"])
df["Date_End"] = pd.to_datetime(df["Date_End"])

All the rows with Token as product are shifted backwards in groups of customers:

df["Expiry"] = df.loc[df["Product"].eq("Token"), "Date_Begin"].groupby(df["Customer"]).shift(-1)

The results:

    Customer Date_Begin   Date_End      Product     Expiry
0          1 2017-01-01 2020-01-01  UsagePeriod        NaT
1          1 2017-01-02 2018-01-02        Token 2017-01-06
2          1 2017-01-06 2018-01-06        Token 2018-12-01
3          1 2018-12-01 2019-12-01        Token 2019-06-01
4          1 2019-06-01 2020-06-01        Token 2020-01-31
5          1 2019-12-21 2022-12-21  UsagePeriod        NaT
6          1 2020-01-31 2021-01-31        Token 2021-06-30
7          1 2021-06-30 2022-06-30        Token 2021-09-30
8          1 2021-09-30 2022-09-30        Token        NaT
9          2 2019-06-01 2022-06-01  UsagePeriod        NaT
10         2 2019-06-01 2020-06-01        Token        NaT
11         3 2019-06-01 2022-06-01  UsagePeriod        NaT
12         3 2019-06-01 2020-06-01        Token 2020-06-01
13         3 2020-06-01 2021-06-01        Token        NaT
14         4 2017-01-01 2020-01-01  UsagePeriod        NaT
15         4 2017-02-15 2018-02-15        Token 2019-12-15
16         4 2019-12-15 2020-12-15        Token        NaT

Then we compare Date_End with the current Expiry. if Date_End is sooner we use date end else Expiry.

df["Expiry"] = df["Expiry"].where(df["Expiry"].lt(df["Date_End"]), df["Date_End"])

This also sets the values where the product is Usage Period to Date_End values.

Customer Date_Begin   Date_End      Product     Expiry
0          1 2017-01-01 2020-01-01  UsagePeriod 2020-01-01
1          1 2017-01-02 2018-01-02        Token 2017-01-06
2          1 2017-01-06 2018-01-06        Token 2018-01-06
3          1 2018-12-01 2019-12-01        Token 2019-06-01
4          1 2019-06-01 2020-06-01        Token 2020-01-31
5          1 2019-12-21 2022-12-21  UsagePeriod 2022-12-21
6          1 2020-01-31 2021-01-31        Token 2021-01-31
7          1 2021-06-30 2022-06-30        Token 2021-09-30
8          1 2021-09-30 2022-09-30        Token 2022-09-30
9          2 2019-06-01 2022-06-01  UsagePeriod 2022-06-01
10         2 2019-06-01 2020-06-01        Token 2020-06-01
11         3 2019-06-01 2022-06-01  UsagePeriod 2022-06-01
12         3 2019-06-01 2020-06-01        Token 2020-06-01
13         3 2020-06-01 2021-06-01        Token 2021-06-01
14         4 2017-01-01 2020-01-01  UsagePeriod 2020-01-01
15         4 2017-02-15 2018-02-15        Token 2018-02-15
16         4 2019-12-15 2020-12-15        Token 2020-12-15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM