简体   繁体   中英

Selecting time-series data in a specific sequence using pandas

I'm trying to create a new sequence of Seasonal data based on observed weather data.

I want to extract seasons from this dataframe, and create a new dataframe which has random sequences of seasons following eachother in chronological order, eg a random spring followed by a random summer followed by a random autumn followed by a random winter.

Timeseries data in CSV format I'm working on is available here

The code I've used so far is as follows...

df = pd.read_csv("location of file")

#convert date column to datetime for querying
df['date'] = pd.to_datetime(df['date'], format= '%d-%b-%y')

#function which extracts seasons
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
    return 'spring'
elif row['date'].month >= 6 and row['date'].month <= 8:
    return 'summer'
elif row['date'].month >= 9 and row['date'].month <= 11:
    return 'autumn'
else:
    return 'winter'

#apply the season function to the data frame
df['Season'] = df.apply(get_season, axis=1)

#Split into seasons
Sp = df.query('Season == "spring"')
#all the winters
W = df.query('Season == "winter"')
#all the summers
SU  = df.query('Season == "summer"')
#all the autumns
Au = df.query('Season == "autumn"')

and here's where I can't get my head around what to do next.

What this has done is separated out all the seasons, but not each individual season (eg winter 2006, winter 2007 etc.).

I'm currently taking random sequences from each season like so:

#sampling a random 92 days from winter
rows = np.random.choice(Sp.index.values, 92)
sampled_df = Sp.ix[rows] 

But this isn't what I want as it's taking random days from the entire winter block, I want to take random seasons (December, January, February) from the winter block.

In order for me to generate this new sequence I will need each season according to each year so I can create a new dataframe containing multiple columns which all begin with a random spring, is followed by a random summer, then a random autumn, then a random winter, for hundreds of years into the future.

I can't figure out how this is done. Please help!

Thanks

I suggest MultiIndex :

df['Year'] = df['date'].dt.year
df2 = df.set_index(['Year', 'Season'], inplace=False)

You now have a dataframe indexed by year and season, and you can easily select an entire season for a given year:

future = pd.DataFrame()
for i in range(5):
    for season in ['winter', 'spring', 'summer', 'autumn']:
        future = future.append(df2.loc[random.choice(range(2007, 2015))]
                                  .loc[season])

Note that I have excluded 2015 because there is no autumn or winter in your data -- you can address this edge case yourself.

Also, the winter for a given year currently consists of January, February, and December. You may want to redefine the year in order to attach December to the winter of the following year.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM