简体   繁体   English

从数据框中随机选择小时

[英]Randomly select hour from dataframe

I have a hard time to randomly select rows from a dataframe. 我很难从数据框中随机选择行。 In general, choosing one row is not a problem using np.random.choice(data,size=1000) . 通常,使用np.random.choice(data,size=1000)选择一行不是问题。 I assume that replacement=True. 我认为替换= True。 However, I need to randomly select an hour and as output, recieve the 4 rows of each quarter. 但是,我需要随机选择一个小时并作为输出,接收每个季度的4行。

The dataframe to choose from is the following (1132 rows): 要选择的数据框如下(1132行):

data=
                     Price  Consume    Feed
StartTime                                  
2018-07-04 02:00:00  45.80    67.91   67.91
2018-07-04 02:15:00  45.80    51.05   51.05
2018-07-04 02:30:00  45.80    46.12   46.12
2018-07-04 02:45:00  45.80    46.86   46.86
2018-07-11 05:00:00  43.80    43.49   43.49
2018-07-11 05:15:00  43.80    50.71   50.71
2018-07-11 05:30:00  43.80    48.19   48.19
2018-07-11 05:45:00  43.80    40.02   40.02

My desired output is something like this: 我想要的输出是这样的:

Assuming the random generator has "selected" 2018-07-11 05:00:00 , the output would be 假设随机生成器已“选择” 2018-07-11 05:00:00 ,则输出为

2018-07-11 05:00:00  43.80    43.49   43.49
2018-07-11 05:15:00  43.80    50.71   50.71
2018-07-11 05:30:00  43.80    48.19   48.19
2018-07-11 05:45:00  43.80    40.02   40.02

Depending on the number (N) of random samples, the length of the resulting dataframe should be 4xN. 根据随机样本的数量(N),结果数据帧的长度应为4xN。

Is it possible to randomly select an dayhour directly from the dataframe and repeat this 1000 times? 是否可以直接从数据框中随机选择一天的时间并重复1000次? I am afraid that using an extra dataframe to select an hour and then looking the corresponding values up in the original dataframe will be too time consuming. 恐怕使用额外的数据框来选择一个小时,然后在原始数据框中查找相应的值会非常耗时。 I am confident that this should be doable in Python, but I couldn`t find any tips on this. 我相信这在Python中应该可行,但是我找不到任何提示。

Thanks for any help! 谢谢你的帮助!

I think you compare values with DatetimeIndex.floor for remove minutes and seconds: 我认为您将值与DatetimeIndex.floor进行比较以删除分钟和秒:

N = 1000
vals = pd.to_datetime(np.random.choice(df.index,size=N)).floor('H')
hours = df.index.floor('H')

for i in vals:
    print (df[hours == i])

EDIT: For join all small DataFrames use concat : 编辑:对于加入所有小的DataFrames使用concat

df1 = pd.concat([df[hours == i] for i in vals])

Or create array with np.concatenate of DatetimeIndex and select by loc : 或者使用DatetimeIndex np.concatenate创建数组, np.concatenateloc选择:

idx = np.concatenate([df.index[hours == i] for i in vals])
df1 = df.loc[idx]

Sample once to get a random index, then find all matches to that date and hour: 采样一次即可获得随机索引,然后查找该日期和小时的所有匹配项:

random_idx = df.sample().index
df[(df.index.date == random_idx.date) & (list(df.index.hour) == random_idx.hour)]

Then to do it 1000 times: 然后做1000次:

for i in range(1000):
    random_idx = df.sample().index
    print(df[(df.index.date == random_idx.date) & (list(df.index.hour) == random_idx.hour)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM