简体   繁体   中英

How can I subset a data frame based on dates, when my dates column is not the index in Python?

I have a large dataset with a date column (which is not the index) with the following format %Y-%m-%d %H:%M:%S .

I would like to create quarterly subsets of this data frame ie the data frame dfQ1 would contain all rows where the date was between month [1 and 4], dfQ2 would contain all rows where the date was between month [5 and 8], etc... The header of the subsets is the same as that of the main data frame.

How can I do this?

Thanks!

I would add a new column containing quarterly information, ie:

from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"
date_to_qtr = lambda dt: 1 + (datetime.strptime(dt, date_format).month-1) // 3
df['qtr'] = df['date'].apply(date_to_qtr)

(using the floordiv function). Then index on the new column:

dfQ1 = df[df.qtr == 1]
dfQ2 = df[df.qtr == 2]
dfQ3 = df[df.qtr == 3]
dfQ4 = df[df.qtr == 4]

Or, by then you can just use groupby, df.groupby("qtr") (see docs ).

Using pandas, you can first create a datetime column and then create a quarter column using the date/time quarter attribute :

from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"
df['datetime'] = [datetime.strptime(dt, date_format) for dt in df['date']]
df['quarter'] = [dt.quarter for dt in df['datetime']]

From there you can subset the dataframe with groupby ( df.groupby('quarter') ) or by indexing:

dfQ1 = df[df.quarter == 1]
dfQ2 = df[df.quarter == 2]
dfQ3 = df[df.quarter == 3]
dfQ4 = df[df.quarter == 4]

Assuming you're using Pandas.

dfQ1 = df[(df.date > Qstartdate) & (df.date < Qenddate)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM