简体   繁体   English

使用熊猫按特定顺序选择时间序列数据

[英]Selecting time-series data in a specific sequence using pandas

I'm trying to create a new sequence of Seasonal data based on observed weather data. 我正在尝试根据观测到的天气数据创建新的季节性数据序列。

I want to extract seasons from this dataframe, and create a new dataframe which has random sequences of seasons following eachother in chronological order, eg a random spring followed by a random summer followed by a random autumn followed by a random winter. 我想从此数据框中提取季节,然后创建一个新的数据框,该数据框具有按时间顺序彼此依次排列的随机季节序列,例如,随机的春季,夏季的随机,秋季的秋季,冬季的随机。

Timeseries data in CSV format I'm working on is available here 我正在处理的CSV格式的时间序列数据可在此处获得

The code I've used so far is as follows... 到目前为止,我使用的代码如下...

df = pd.read_csv("location of file")

#convert date column to datetime for querying
df['date'] = pd.to_datetime(df['date'], format= '%d-%b-%y')

#function which extracts seasons
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
    return 'spring'
elif row['date'].month >= 6 and row['date'].month <= 8:
    return 'summer'
elif row['date'].month >= 9 and row['date'].month <= 11:
    return 'autumn'
else:
    return 'winter'

#apply the season function to the data frame
df['Season'] = df.apply(get_season, axis=1)

#Split into seasons
Sp = df.query('Season == "spring"')
#all the winters
W = df.query('Season == "winter"')
#all the summers
SU  = df.query('Season == "summer"')
#all the autumns
Au = df.query('Season == "autumn"')

and here's where I can't get my head around what to do next. 这是我无法理解下一步要做什么的地方。

What this has done is separated out all the seasons, but not each individual season (eg winter 2006, winter 2007 etc.). 这样做是将所有季节分开,但不是每个季节都分开(例如2006年冬季,2007年冬季等)。

I'm currently taking random sequences from each season like so: 我目前正在从每个季节中抽取随机序列,如下所示:

#sampling a random 92 days from winter
rows = np.random.choice(Sp.index.values, 92)
sampled_df = Sp.ix[rows] 

But this isn't what I want as it's taking random days from the entire winter block, I want to take random seasons (December, January, February) from the winter block. 但这不是我想要的,因为整个冬季要随机抽几天,我要从冬季取随机的季节(十二月,一月,二月)。

In order for me to generate this new sequence I will need each season according to each year so I can create a new dataframe containing multiple columns which all begin with a random spring, is followed by a random summer, then a random autumn, then a random winter, for hundreds of years into the future. 为了让我生成这个新序列,我将需要根据每年的每个季节,以便创建一个包含多个列的新数据框,这些列都以一个随机的春天开始,之后是一个随机的夏天,然后是一个随机的秋天,然后是随机的冬季,到未来数百年。

I can't figure out how this is done. 我不知道如何做到这一点。 Please help! 请帮忙!

Thanks 谢谢

I suggest MultiIndex : 我建议MultiIndex

df['Year'] = df['date'].dt.year
df2 = df.set_index(['Year', 'Season'], inplace=False)

You now have a dataframe indexed by year and season, and you can easily select an entire season for a given year: 现在,您有了一个按年份和季节编制索引的数据框,并且可以轻松地选择给定年份的整个季节:

future = pd.DataFrame()
for i in range(5):
    for season in ['winter', 'spring', 'summer', 'autumn']:
        future = future.append(df2.loc[random.choice(range(2007, 2015))]
                                  .loc[season])

Note that I have excluded 2015 because there is no autumn or winter in your data -- you can address this edge case yourself. 请注意,我将2015年排除在外是因为您的数据中没有秋季或冬季-您可以自己解决这种情况。

Also, the winter for a given year currently consists of January, February, and December. 另外,给定年份的冬季当前包括一月,二月和十二月。 You may want to redefine the year in order to attach December to the winter of the following year. 您可能需要重新定义年份,以便将12月附加到下一年的冬季。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM