I have a data frame like the following
transaction_no sales_order is_delivered dispatch_date remarks ....
0 2122.0 1.0 True 06-01-2020 NaN
1 2122.0 1.0 True 06-01-2020 NaN
2 2122.0 1.0 True 06-01-2020 NaN
3 2122.0 1.0 True 06-01-2020 NaN
4 2122.0 1.0 True 06-01-2020 NaN
I want to select rows based on a date range criteria but I am getting the empty dataframe every time
Here's what I did:
dt_format = '%Y-%m-%d %H:%M'
o_f = datetime.strptime(request.GET['from'], dt_format).strftime('%d/%m/%Y')
o_t = datetime.strptime(request.GET['to'], dt_format).strftime('%d/%m/%Y')
f = datetime.strptime(request.GET['from'], dt_format).replace(tzinfo=pytz.UTC).date().strftime("%d-%m-%Y")
t = datetime.strptime(request.GET['to'], dt_format).replace(tzinfo=pytz.UTC).date().strftime("%d-%m-%Y")
allot_df = allot_df[allot_df['dispatch_date'].isin(pd.date_range(f, t))]
How can I do that? Better yet why is this not working?
Update: The type of column was str
so I changed it to datetime
allot_df['dispatch_date'] = pd.to_datetime(allot_df['dispatch_date'])
allot_df = allot_df[allot_df['dispatch_date'].isin(pd.date_range(f, t))]
But now the whole dataframe comes as the output
Assume that just after reading, eg calling pd.read_csv , without any type conversion, your DataFrame contains:
transaction_no sales_order is_delivered dispatch_date
0 2122.0 1.0 True 06-01-2020
1 2123.0 1.0 True 07-01-2020
2 2124.0 1.0 True 08-01-2020
3 2125.0 1.0 True 09-01-2020
4 2126.0 1.0 True 10-01-2020
To check column types run df.info()
and the result should be something like:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 transaction_no 5 non-null float64
1 sales_order 5 non-null float64
2 is_delivered 5 non-null bool
3 dispatch_date 5 non-null object
dtypes: bool(1), float64(2), object(1)
memory usage: 165.0+ bytes
Note Dtype for dispatch_date column. It is object (more precisely, something other than a number, and actually - a string ).
A good habit in working with Pandas object is to use its native datetime type, and not to use datetime module. This way your code will run substantially faster than if you used other date/time representation.
So the first step is to convert dispatch_date column from string to datetime . You can do it calling:
df.dispatch_date = pd.to_datetime(df.dispatch_date, dayfirst=True)
Now when you print df , you will get:
transaction_no sales_order is_delivered dispatch_date
0 2122.0 1.0 True 2020-01-06
1 2123.0 1.0 True 2020-01-07
2 2124.0 1.0 True 2020-01-08
3 2125.0 1.0 True 2020-01-09
4 2126.0 1.0 True 2020-01-10
The first thing to notice is that now dispatch_date is printed in year-month-day format, but for now you may be not sure about its type. To check this detail, run df.info()
again and the row concerning dispatch_date should be:
3 dispatch_date 5 non-null datetime64[ns]
And if you want to retrieve rows for particular date range, you can eg:
Something like:
df.query("dispatch_date.between('2020-01-07', '2020-01-09')")
The result is:
transaction_no sales_order is_delivered dispatch_date
1 2123.0 1.0 True 2020-01-07
2 2124.0 1.0 True 2020-01-08
3 2125.0 1.0 True 2020-01-09
Note that the ending date is inclusive , contrary to the way how you specify Pandas slices, where the right border is exclusive .
I deliberately didn't go into such details like how to extract both date strings from your source data, this is another issue and you should cope with it alone.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.