I have the following data set. I try to keep only the entries from a certain date range that I give. The probelm we have is that when the start and end date aren't in the dates of my date set, I take a key err exception.
Duration Film Deadline
1777 a 02/04/2018
1777 b 02/04/2018
1777 b 02/04/2018
942 b 03/04/2018
941 c 03/04/2018
start_date = sys.argv[1]
end_date = sys.argv[2]
df_filtered = df_filtered.set_index([5])
df_filtered = df_filtered.dropna(axis=0, how='all')
df_range = df_filtered[start_date:end_date]
df_groupby = df_range.groupby([4])[3].sum()
film = df_groupby.index.values.tolist()
footage = df_groupby.values.astype(int).tolist()
The code is the above. Any ideas?
I think need convert to DatetimeIndex
column Deadline
:
print (df)
Duration Film Deadline
0 1777 a 01/04/2018
1 1777 b 02/04/2018
2 1777 b 03/04/2018
3 942 b 04/04/2018
4 941 c 05/04/2018
df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)
start_date= '2018-03-25'
end_date = '2018-04-04'
df = df.set_index('Deadline')[start_date:end_date]
print (df)
Duration Film
Deadline
2018-04-01 1777 a
2018-04-02 1777 b
2018-04-03 1777 b
2018-04-04 942 b
Another solution with between
and filter by boolean indexing
:
df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)
start_date= '2018-03-25'
end_date = '2018-04-04'
df = df[df['Deadline'].between(start_date, end_date)]
print (df)
Duration Film Deadline
0 1777 a 2018-04-01
1 1777 b 2018-04-02
2 1777 b 2018-04-03
3 942 b 2018-04-04
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.