简体   繁体   中英

Subsetting columns of months by season in pandas DataFrame

I have a set of data that extracted from an excel sheet. one of the columns were dates and so I used df.to_datetime() to reformat the data. The data I have is only for the first day of each month, the string format is:

"20190101"

so there are only 12 dates.

Then I used pd.pivot to make this column of dates the column labels. The plan now is to subset the columns by seasons (eg Jan, Feb, Dec is winter). I only have 2019 data but I will be using this code for future years. I will be taking Dec of previous year so I need to disregard the years.

How do I subset the columns based on the months. In other words I want to divide the DataFrame similarly to:

for column in full_df.iterrows()
    If (column_label.datetime.month = 01 | column_label.datetime.month = 02 |column_label.datetime.month = 12):
        winter_df[datetime.month] = full_df[column_label]

I know for loops are frowned upon in Dataframe, and I know I can hard-code it by keeping it as a string and typing in the specific strings, but the data will not be from the same year each time and I would have to adjust the code every year. How do I do an if statement for column labels? df.filter() might be a good idea but I don't know if its possible with datetime object.

PS forgive me if I formed the question poorly or I don't understand your answer. Fairly new to pandas.

Update: I am looking to split the DataFrame into three based on month, The three periods of time are as follows: Winter: January ,February ,December Light load: March, April, May Summer: June,July August, September

From your description this achieves what you want

  1. categorises a quarter into a season
  2. "pivots" actually transposes to column format

update

There seems to be a condition when pandas can give wrong quarter. Can instead calculate from integer division on month

df = df.assign(season=lambda x: x["Date"].apply(lambda s: ["winter","spring","summer","autumn"][(s.month-1)//3]))

import datetime as dt
import pandas as pd
import random
df = pd.DataFrame([{"Date":d, "Value": random.randint(10,20)} 
 for d in pd.date_range(dt.datetime(2018,1,1),dt.datetime(2019,3,1), freq="MS")])

df = df.assign(season=lambda x: x["Date"].apply(lambda s: ["winter","spring","summer","autumn"][s.quarter-1]))
df = df.set_index(["season","Date"]).T

output

season     winter                           spring                           summer                           autumn                           winter                      
Date   2018-01-01 2018-02-01 2018-03-01 2018-04-01 2018-05-01 2018-06-01 2018-07-01 2018-08-01 2018-09-01 2018-10-01 2018-11-01 2018-12-01 2019-01-01 2019-02-01 2019-03-01
Value          10         20         12         17         12         12         11         20         11         19         12         10         11         20         14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM