简体   繁体   English

从多个多索引数据帧创建新数据帧

[英]Create new dataframe from multiple multi-index dataframes

I want to create a new dataframe with x amount of years which takes random seasons from previous weather data. 我想创建一个x年的新数据框,该数据框需要使用以前天气数据中的随机季节。

Code to illustrate the problem: 代码来说明问题:

import pandas as pd
import numpy as np

dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['date'] = dates
df = df[['date','A']]

Apply season function to the datetime index 将季节函数应用于日期时间索引

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return '2'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return '3'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return '4'
    else:
        return '1'

Apply the function 应用功能

df['Season'] = df.apply(get_season, axis=1)

Create a 'Year' column for indexing 创建“年份”列以建立索引

df['Year'] = df['date'].dt.year

Multi-index by Year and Season 年份和季节的多指标

df = df.set_index(['Year', 'Season'], inplace=False)

Create new dataframes based on season to select from 根据季节创建新的数据框以从中选择

winters = df.query('Season == "1"')
springs = df.query('Season == "2"')
summers = df.query('Season == "3"')
autumns = df.query('Season == "4"')

I now want to create a new DataFrame which takes a random winter from the winters dataframe, followed by a random spring from the springs , followed by a random summer from summers and random autumn from autumns and does this for a specified number of years (eg 100) but I can't see how to do this. 我现在想创建一个新的DataFrame这需要从一个随机的冬天winters据帧,然后从随机弹簧springs ,其次是从随机夏天summers从随机秋季autumns及这是否为指定的年数(如100),但我看不到该怎么做。

EDIT: 编辑:

Duplicate seasons are allowed (it should sample seasons randomly), and the first spring does not have to belong to the same year as the first winter, this doesn't matter. 允许重复的季节(应该随机采样季节),并且第一个春天不必与第一个冬天属于同一年,这无关紧要。

EDIT 2: Solution using all seasonal dataframes: 编辑2:使用所有季节性数据框的解决方案:

years = df['date'].dt.year.unique()
dfs = []
for i in range(outputyears):
    dfs.append(winters.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(springs.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(summers.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(autumns.query("Year == %d"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)

It's most probably not the best way to do it, but you can do it this way: 这很可能不是最好的方法,但是您可以这样操作:

years = df['date'].dt.year.unique()

dfs = []
for i in range(100):
    dfs.append(df.query("Year == %d and Season == '1'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '2'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '3'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '4'"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM