How to create a new boolean column in a dataframe based on multiple conditions from other dataframe in pandas

Question

I have a dataframe

entity  response    date
p   a1  1-Feb-14
p   a2  2-Feb-14
p   a3  3-Feb-14
p   a4  4-Feb-14
p   a5  5-Feb-14
p   a6  6-Feb-14
p   a7  7-Feb-14
p   a8  8-Feb-14
p   a9  9-Feb-14
p   a10 10-Feb-14
p   a11 11-Feb-14
p   a12 12-Feb-14
p   a13 13-Feb-14
p   a14 14-Feb-14
p   a15 15-Feb-14

and another data frame :

entity  start_date  end_date
p   2-Feb-14    4-Feb-14
p   6-Feb-14    7-Feb-14
p   9-Feb-14    12-Feb-14
q   1-Feb-14    7-Feb-14

based on the second data frame I have to create a True False column in the 1st dataframe for P if the date lies between any of start and end date window it should be true else false.

What could be the fastest way of doing this and shortest as well. I tried iterating over the whole data frame but that takes time and makes the code long as well

Answer 1

Maybe I'm overthinking, but

def f(s):
    f2 = lambda d, n: ((d >= df2[df2.entity == n].start_date) & (d <= df2[df2.entity==n].end_date)).any()
    return(s.transform(f2, n=s.name))

df.groupby('entity').date.transform(f)

0     False
1      True
2      True
3      True
4     False
5      True
6      True
7     False
8      True
9      True
10     True
11     True
12    False
13    False
14    False
15    False
Name: date, dtype

You can also do some preprocessing first to speed up the process

df2['j']  = df2.agg(lambda k: pd.Interval(k.start_date, k.end_date), 1)
dic = df2.groupby('entity').agg(lambda k: list(k)).to_dict()['j']
df[['entity', 'date']].transform(lambda x: any(x['date'] in z for z in dic[x['entity']]), 1)

Notice that this uses pd.Interval by default closed only on the right, but should be around 20x faster than chained transforms.

Answer 2

IMHO, depending on your data, sometimes it's acceptable to expand date range first

df2 = pd.concat([
    pd.DataFrame(pd.date_range(start_date, end_date), columns=['date']).assign(entity=entity)
    for _, (entity, start_date, end_date) in df2.iterrows()
]).drop_duplicates()
df.merge(df2, on=['entity', 'date'], how='left', indicator=True)['_merge'] == 'both'

How to create a new boolean column in a dataframe based on multiple conditions from other dataframe in pandas

Question

2 answers

solution1
0 2018-08-09 00:44:35

solution2
0 ACCPTED 2018-08-09 01:22:11

How to create a new boolean column in a dataframe based on multiple conditions from other dataframe in pandas

Question

2 answers

solution1 0 2018-08-09 00:44:35

solution2 0 ACCPTED 2018-08-09 01:22:11

solution1
0 2018-08-09 00:44:35

solution2
0 ACCPTED 2018-08-09 01:22:11