How to check if dates in one pandas dataframe fall within ranges of dates in another?

Question

I have two pandas dataframes. One is my main file with lots of rows containing "events" such as in the example below, lets call it main_df. While only 2 rows are given, there are around 200.000 in the real file.

event_nr	date	detail_1	detail_2
e_0001	2010-01-02 07:30:00	1	100
e_0002	2010-06-05 07:30:00	0	132

I then have another df_ranges with date ranges, here also there are a multitude of ranges, albeit not overlapping.

start	end	name
2010-02-25	2010-03-06	range_1
2010-06-02	2010-06-08	range_2

I now want to check if each date in main_df occurs during any of the time ranges in df_ranges, and if so, create a new column in main_df with the name of the range as a value in the same row.

Desired result:

event_nr	date	detail_1	detail_2	in_range
e_0001	2010-01-02 07:30:00	1	100	None
e_0002	2010-06-05 07:30:00	0	132	range_2

Most solutions I have encountered only wanted to check if certain dates were between start and end of one single range. I am sure I have to iterate over all ranges and all dates, but don't know how to proceed in practice.

Thank you!

Answer 1

Here is a solution.

Given the following DataFrames:

events = pd.DataFrame(
    {
        "event_nr":["e_0001", "e_0002"],
        "date":["2010-01-02 07:30:00", "2010-06-05 07:30:00"]
    }
)


ranges = pd.DataFrame(
    {
        "start":["2010-02-25", "2010-06-02"],
        "end":["2010-03-06", "2010-06-08"],
        "range_name":["range_1","range_2"] 
    }
)

First, let's make sure all dates are actually datetime objects:

events["date"] = pd.to_datetime(events["date"])
ranges["start"] = pd.to_datetime(ranges["start"])
ranges["end"] = pd.to_datetime(ranges["end"])

Then, let's make a function that implement the desired selection criteria.

def f(row):
    global ranges
    for _i, _row in ranges.iterrows():
        if _row.start <= row.date <= _row.end:
            return _row.range_name
    return None

Finally, let's apply the function to the dataframe to generate the desired new column.

events["in_range"] = events.apply(f, axis=1)

And here is the result:

print(events)

  event_nr                date in_range
0   e_0001 2010-01-02 07:30:00     None
1   e_0002 2010-06-05 07:30:00  range_2

How to check if dates in one pandas dataframe fall within ranges of dates in another?

Question

1 answers

solution1
0 2022-12-14 14:52:39

How to check if dates in one pandas dataframe fall within ranges of dates in another?

Question

1 answers

solution1 0 2022-12-14 14:52:39

solution1
0 2022-12-14 14:52:39