简体   繁体   中英

Pandas: How to create True/False column that check existing row with certain values in the previous weeks?

Could you please help?

This is my dataframe.

Branch name WeekStart Product
Apple Store 01/11/2021 00:00:00 Apple Iphone XR 64 gb
Apple Store 01/11/2021 00:00:00 Apple Iphone 11 128 gb
T mobile 01/11/2021 00:00:00 Apple Iphone 13 Pro 256 gb
T mobile 01/11/2021 00:00:00 Apple Iphone 12 256 gb
Apple Store 08/11/2021 00:00:00 Apple Iphone XR 64 gb
Apple Store 08/11/2021 00:00:00 Apple Iphone 11 128 gb
T mobile 15/11/2021 00:00:00 Apple Iphone 13 Pro 256 gb
T mobile 15/11/2021 00:00:00 Apple Iphone 12 256 gb
Apple Store 15/11/2021 00:00:00 Apple Iphone XR 64 gb
Apple Store 15/11/2021 00:00:00 Apple Iphone 11 128 gb
T mobile 22/11/2021 00:00:00 Apple Iphone 13 Pro 256 gb
T mobile 22/11/2021 00:00:00 Apple Iphone 12 256 gb
Apple Store 22/11/2021 00:00:00 Apple Iphone XR 64 gb
Apple Store 22/11/2021 00:00:00 Apple Iphone 11 128 gb

I want to create a new column that will tell me exists in a certain outlet a specific product in the past 3 weeks inclusive. If there was no product in the certain outlet last 3 weeks then false.

Branch name WeekStart Product Exists prev 3week
Apple Store 22/11/2021 00:00:00 Apple Iphone XR 64 gb True
Apple Store 22/11/2021 00:00:00 Apple Iphone 11 128 gb True
T mobile 22/11/2021 00:00:00 Apple Iphone 13 Pro 256 gb False
T mobile 22/11/2021 00:00:00 Apple Iphone 12 256 gb False

How can I do this?

I tried to no avail:

def prev_3week(x):
    if (x - pd.DateOffset(weeks=3) in x.values) & (x - pd.DateOffset(weeks=2) in x.values) & (x - pd.DateOffset(weeks=1) in x.values):
        return True #considering day greater than 14 as third week 
    else:
        return False
df['Exists prev 3week'] = df.groupby(['Branch name'])['WeekStart'].apply(lambda x: prev_3week(x)).reset_index(drop=True)

I assume your dataframe is already sorted by WeekStart column and have the data type datetime64 .

Try:

# If it's not already the case, comment out this lines
# df['WeekStart'] = pd.to_datetime(df['WeekStart'], dayfirst=True)
# df = df.sort_values('WeekStart')

consecutive_week = lambda x, n: x.sub(x.shift()).eq(pd.Timedelta(days=7))[-n:].all()

out = df.groupby(['Branch name', 'Product']).agg(**{
    'WeekStart': ('WeekStart', 'last'),
    'Exists prev 3week': ('WeekStart', lambda x: consecutive_week(x, n=3)),
    'Exists prev 2week': ('WeekStart', lambda x: consecutive_week(x, n=2))
}).reset_index()

Output:

>>> out
   Branch name                     Product  WeekStart  Exists prev 3week  Exists prev 2week
0  Apple Store      Apple Iphone 11 128 gb 2021-11-22               True               True
1  Apple Store       Apple Iphone XR 64 gb 2021-11-22               True               True
2     T mobile      Apple Iphone 12 256 gb 2021-11-22              False              False
3     T mobile  Apple Iphone 13 Pro 256 gb 2021-11-22              False              False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM