Filtering Pandas DataFrames on dates

Question

I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.

What is the best way to achieve this?

Answer 1

If date column is the index , then use .loc for label based indexing or .iloc for positional indexing.

For example:

df.loc['2014-01-01':'2014-02-01']

See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

If the column is not the index you have two choices:

Make it the index (either temporarily or permanently if it's time-series data)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

See here for the general explanation

Note: .ix is deprecated.

Answer 2

Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:

import datetime 
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]

Answer 3

And if your dates are standardized by importing datetime package, you can simply use:

df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]

For standarding your date string using datetime package, you can use this function:

import datetime
datetime.datetime.strptime

Answer 4

如果您已经使用 pd.to_datetime 将字符串转换为日期格式，则可以使用：

df = df[(df['Date']> "2018-01-01") & (df['Date']< "2019-07-01")]

Answer 5

If your datetime column have the Pandas datetime type (eg datetime64[ns] ), for proper filtering you need the pd.Timestamp object , for example:

from datetime import date

import pandas as pd

value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]

Answer 6

如果日期在索引中，那么只需：

df['20160101':'20160301']

Answer 7

You can use pd.Timestamp to perform a query and a local reference

import pandas as pd
import numpy as np

df = pd.DataFrame()
ts = pd.Timestamp

df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')

print(df)
print(df.query('date > @ts("20190515T071320")')

with the output

                 date
0 2019-05-15 07:13:16
1 2019-05-15 07:13:17
2 2019-05-15 07:13:18
3 2019-05-15 07:13:19
4 2019-05-15 07:13:20
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25


                 date
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25

Have a look at the pandas documentation for DataFrame.query , specifically the mention about the local variabile referenced udsing @ prefix. In this case we reference pd.Timestamp using the local alias ts to be able to supply a timestamp string

Answer 8

The shortest way to filter your dataframe by date: Lets suppose your date column is type of datetime64[ns]

# filter by single day
df_filtered = df[df['date'].dt.strftime('%Y-%m-%d') == '2014-01-01']

# filter by single month
df_filtered = df[df['date'].dt.strftime('%Y-%m') == '2014-01']

# filter by single year
df_filtered = df[df['date'].dt.strftime('%Y') == '2014']

Answer 9

So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().

If you just want to show the data for two months from Jan to Feb, eg 2020-01-01 to 2020-02-29, you can do so:

import pandas as pd
mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
mydata['2020-01-01':'2020-02-29'] # will pull all the columns
#if just need one column, e.g. Cost, can be done:
mydata['2020-01-01':'2020-02-29','Cost']

This has been tested working for Python 3.7. Hope you will find this useful.

Answer 10

I'm not allowed to write any comments yet, so I'll write an answer, if somebody will read all of them and reach this one.

If the index of the dataset is a datetime and you want to filter that just by (for example) months, you can do following:

df.loc[df.index.month == 3]

That will filter the dataset for you by March.

Answer 11

How about using pyjanitor

It has cool features.

After pip install pyjanitor

import janitor

df_filtered = df.filter_date(your_date_column_name, start_date, end_date)

Answer 12

您可以通过执行以下df.loc['start_date':'end_date']选择时间范围： df.loc['start_date':'end_date']

Answer 13

import the pandas library

import pandas as pd

STEP 1: convert the date column into a string using the pd.to_datetime() method

   df['date']=pd.to_datetime(df["date"],unit='s')

STEP 2: perform the filtering in any predetermined manner ( ie 2 months)

  df = df[(df["date"] >"2022-03-01" & df["date"] < "2022-05-03")]

STEP 3: Check the output

 print(df)

Answer 14

Another solution if you would like to use the .query() method.

It allows you to use write readable code like .query(f"{start} < MyDate < {end}") on the trade off, that .query() parses strings and the columns values must be in pandas date format (so that it is also understandable for .query())

df = pd.DataFrame({
     'MyValue': [1,2,3],
     'MyDate': pd.to_datetime(['2021-01-01','2021-01-02','2021-01-03'])
})
start = datetime.date(2021,1,1).strftime('%Y%m%d')
end = datetime.date(2021,1,3).strftime('%Y%m%d')
df.query(f"{start} < MyDate < {end}")

(following the comment from @Phillip Cloud, answer from @Retozi)

Answer 15

# 60 days from today
after_60d = pd.to_datetime('today').date() + datetime.timedelta(days=60)
# filter date col less than 60 days date
df[df['date_col'] < after_60d]

Answer 16

In pandas version 1.1.3 I encountered a situation where the python datetime based index was in descending order. In this case

df.loc['2021-08-01':'2021-08-31']

returned empty. Whereas

df.loc['2021-08-31':'2021-08-01']

returned the expected data.

Filtering Pandas DataFrames on dates

Question

16 answers

solution1
337 2014-04-06 19:32:50

solution2
73 2016-02-04 14:29:12

solution3
56 2017-01-25 07:02:52

solution4
32 2020-07-21 19:06:47

solution5
28 2018-06-18 00:38:24

solution6
18 2016-11-18 14:18:23

solution7
15 2019-05-15 07:16:15

solution8
12 2020-04-21 08:23:33

solution9
9 2020-04-07 11:42:39

solution10
7 2020-04-23 08:22:32

solution11
5 2019-11-24 06:09:46

solution12
3 2020-06-12 09:51:16

solution13
2 2022-05-03 09:46:12

import the pandas library

solution14
1 2021-08-03 13:04:56

solution15
1 2022-03-16 11:15:24

solution16
0 2021-10-12 02:38:45

Filtering Pandas DataFrames on dates

Question

16 answers

solution1 337 2014-04-06 19:32:50

solution2 73 2016-02-04 14:29:12

solution3 56 2017-01-25 07:02:52

solution4 32 2020-07-21 19:06:47

solution5 28 2018-06-18 00:38:24

solution6 18 2016-11-18 14:18:23

solution7 15 2019-05-15 07:16:15

solution8 12 2020-04-21 08:23:33

solution9 9 2020-04-07 11:42:39

solution10 7 2020-04-23 08:22:32

solution11 5 2019-11-24 06:09:46

solution12 3 2020-06-12 09:51:16

solution13 2 2022-05-03 09:46:12

import the pandas library

solution14 1 2021-08-03 13:04:56

solution15 1 2022-03-16 11:15:24

solution16 0 2021-10-12 02:38:45

solution1
337 2014-04-06 19:32:50

solution2
73 2016-02-04 14:29:12

solution3
56 2017-01-25 07:02:52

solution4
32 2020-07-21 19:06:47

solution5
28 2018-06-18 00:38:24

solution6
18 2016-11-18 14:18:23

solution7
15 2019-05-15 07:16:15

solution8
12 2020-04-21 08:23:33

solution9
9 2020-04-07 11:42:39

solution10
7 2020-04-23 08:22:32

solution11
5 2019-11-24 06:09:46

solution12
3 2020-06-12 09:51:16

solution13
2 2022-05-03 09:46:12

solution14
1 2021-08-03 13:04:56

solution15
1 2022-03-16 11:15:24

solution16
0 2021-10-12 02:38:45