[英]Create new dataframe from multiple multi-index dataframes
I want to create a new dataframe with x amount of years which takes random seasons from previous weather data. 我想创建一个x年的新数据框,该数据框需要使用以前天气数据中的随机季节。
Code to illustrate the problem: 代码来说明问题:
import pandas as pd
import numpy as np
dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['date'] = dates
df = df[['date','A']]
Apply season function to the datetime index 将季节函数应用于日期时间索引
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
return '2'
elif row['date'].month >= 6 and row['date'].month <= 8:
return '3'
elif row['date'].month >= 9 and row['date'].month <= 11:
return '4'
else:
return '1'
Apply the function 应用功能
df['Season'] = df.apply(get_season, axis=1)
Create a 'Year' column for indexing 创建“年份”列以建立索引
df['Year'] = df['date'].dt.year
Multi-index by Year and Season 年份和季节的多指标
df = df.set_index(['Year', 'Season'], inplace=False)
Create new dataframes based on season to select from 根据季节创建新的数据框以从中选择
winters = df.query('Season == "1"')
springs = df.query('Season == "2"')
summers = df.query('Season == "3"')
autumns = df.query('Season == "4"')
I now want to create a new DataFrame
which takes a random winter from the winters
dataframe, followed by a random spring from the springs
, followed by a random summer from summers
and random autumn from autumns
and does this for a specified number of years (eg 100) but I can't see how to do this. 我现在想创建一个新的
DataFrame
这需要从一个随机的冬天winters
据帧,然后从随机弹簧springs
,其次是从随机夏天summers
从随机秋季autumns
及这是否为指定的年数(如100),但我看不到该怎么做。
EDIT: 编辑:
Duplicate seasons are allowed (it should sample seasons randomly), and the first spring does not have to belong to the same year as the first winter, this doesn't matter. 允许重复的季节(应该随机采样季节),并且第一个春天不必与第一个冬天属于同一年,这无关紧要。
EDIT 2: Solution using all seasonal dataframes: 编辑2:使用所有季节性数据框的解决方案:
years = df['date'].dt.year.unique()
dfs = []
for i in range(outputyears):
dfs.append(winters.query("Year == %d" %np.random.choice(years, 1)))
dfs.append(springs.query("Year == %d" %np.random.choice(years, 1)))
dfs.append(summers.query("Year == %d" %np.random.choice(years, 1)))
dfs.append(autumns.query("Year == %d" %np.random.choice(years, 1)))
rnd = pd.concat(dfs)
It's most probably not the best way to do it, but you can do it this way: 这很可能不是最好的方法,但是您可以这样操作:
years = df['date'].dt.year.unique()
dfs = []
for i in range(100):
dfs.append(df.query("Year == %d and Season == '1'" %np.random.choice(years, 1)))
dfs.append(df.query("Year == %d and Season == '2'" %np.random.choice(years, 1)))
dfs.append(df.query("Year == %d and Season == '3'" %np.random.choice(years, 1)))
dfs.append(df.query("Year == %d and Season == '4'" %np.random.choice(years, 1)))
rnd = pd.concat(dfs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.