简体   繁体   中英

Pandas-Python date range

I have the following data set. I try to keep only the entries from a certain date range that I give. The probelm we have is that when the start and end date aren't in the dates of my date set, I take a key err exception.

Duration    Film    Deadline
1777         a      02/04/2018
1777         b      02/04/2018
1777         b      02/04/2018
942          b      03/04/2018
941          c      03/04/2018


  start_date = sys.argv[1]
  end_date = sys.argv[2]
  df_filtered = df_filtered.set_index([5])
  df_filtered = df_filtered.dropna(axis=0, how='all')
  df_range = df_filtered[start_date:end_date]
  df_groupby = df_range.groupby([4])[3].sum()
  film = df_groupby.index.values.tolist()
  footage = df_groupby.values.astype(int).tolist()

The code is the above. Any ideas?

I think need convert to DatetimeIndex column Deadline :

print (df)
   Duration Film    Deadline
0      1777    a  01/04/2018
1      1777    b  02/04/2018
2      1777    b  03/04/2018
3       942    b  04/04/2018
4       941    c  05/04/2018

df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)

start_date= '2018-03-25'
end_date = '2018-04-04'

df = df.set_index('Deadline')[start_date:end_date]
print (df)
            Duration Film
Deadline                 
2018-04-01      1777    a
2018-04-02      1777    b
2018-04-03      1777    b
2018-04-04       942    b

Another solution with between and filter by boolean indexing :

df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)

start_date= '2018-03-25'
end_date = '2018-04-04'

df = df[df['Deadline'].between(start_date, end_date)]

print (df)
   Duration Film   Deadline
0      1777    a 2018-04-01
1      1777    b 2018-04-02
2      1777    b 2018-04-03
3       942    b 2018-04-04

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM