简体   繁体   中英

python pandas - Analyse period between two dates for other dates

I have a start date and an end date (df_with_start_end) for a specific id and I try to figure which other dates with the same id from another dataframe (df_dates) are between them. The result should be entered in a new column.

The idea was I iterate over the dataframe df_with_start_end with the unique IDs and for every ID I try to analyse if there are any other dates from df_dates within the start and end date from df_with_start_end.

My implementation is like this,but it doesn't work that way.

for k in df_with_start_end['ID']:
    df_with_start_end[k]['FREE_PERIOD'] = df_with_start_end[k]['START_DATE'] <= df_dates[k]['DATE'] < df_with_start_end[k]['END_DATE']

I get this error:

Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 3685509

Here is an example of the dataFrames:

df_with_start_end
ID  START_DATE  END_DATE    FREE_PERIOD
1   2015-02-13  2016-02-13  False
2   2014-08-27  2015-08-27  True

df_dates
ID  DATE
1   2014-04-23
1   2015-08-02
1   2015-09-15
2   2014-06-19
2   2017-01-07

I heard loops are slow in python is there a way to avoid them in my case?

Looks like you wish to iterate over rows but you actually do it over columns.

for k in df_with_start_end['ID']: means k is an ID value.

However df_with_start_end[k] access the column whose value is k . Since your columns are only START_DATE END_DATE FREE_PERIOD you get an error that the value you seek does not exist.

A solution to that would be either to first access the column and then the ID by switching the order of your call:

df_with_start_end['FREE_PERIOD'][k]

But a nicer way would be to use the loc function:

df_with_start_end.loc[k, 'FREE_PERIOD']

For me the easiest way was to join the both dataFrames. For this join I used merge(). Then it's much better to compare them. The problem was I tried to avoid to join them, but it looks like it's sometimes the better way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM