采样numpy数组的最快方法是什么？

Question

I have a 3D (time, X, Y) numpy array containing 6 hourly time series for a few years. 我有一个3D（时间，X，Y）numpy数组，包含6个小时的时间序列几年。 (say 5). （比如5）。 I would like to create a sampled time series containing 1 instance of each calendar day randomly taken from the available records (5 possibilities per day), as follows. 我想创建一个采样时间序列，其中包含从可用记录中随机抽取的每个日历日的1个实例（每天5种可能性），如下所示。

Jan 01: 2006 1月01日：2006年
Jan 02: 2011 1月02日：2011年
Jan 03: 2009 1月03日：2009年
... ...

this means I need to take 4 values from 01/01/2006, 4 values from 02/01/2011, etc. I have a working version which works as follows: 这意味着我需要从01/01/2006获取4个值，从2011年2月1日起获取4个值等。我有一个工作版本，其工作方式如下：

Reshape the input array to add a "year" dimension (Time, Year, X, Y) 重塑输入数组以添加“年”维度（时间，年份，X，Y）
Create a 365 values array of randomly generated integers between 0 and 4 创建一个随机生成的0到4之间整数的365值数组
Use np.repeat and array of integers to extract only the relevant values: 使用np.repeat和整数数组仅提取相关值：

Example: 例：

sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]

This seems to work, but I was wondering if this is the best/fastest approach to solve my problem? 这似乎有效，但我想知道这是否是解决我问题的最佳/最快方法？ Speed is important as I am doing this in a loop, adn would benefit from testing as many cases as possible. 速度很重要，因为我在循环中这样做，adn将受益于测试尽可能多的情况。

Am I doing this right? 我这样做了吗？

Thanks 谢谢

EDIT I forgot to mention that I filtered the input dataset to remove the 29th of feb for leap years. 编辑我忘了提到我过滤了输入数据集以删除闰年的第29个。

Basically the aim of that operation is to find a 365 days sample that matches well the long term time series in terms on mean etc. If the sampled time series passes my quality test, I want to export it and start again. 基本上，该操作的目的是找到一个365天的样本，与平均值等方面的长期时间序列匹配良好。如果采样的时间序列通过我的质量测试，我想导出它并重新开始。

Answer 1

The year 2008 was 366 days long, so don't reshape. 2008年是366天，所以不要重塑。

Have a look at scikits.timeseries : 看看scikits.timeseries ：

import scikits.timeseries as ts

start_date = ts.Date('H', '2006-01-01 00:00')
end_date = ts.Date('H', '2010-12-31 18:00')
arr3d = ... # your 3D array [time, X, Y]

dates = ts.date_array(start_date=start_date, end_date=end_date, freq='H')[::6]
t = ts.time_series(arr3d, dates=dates)
# just make sure arr3d.shape[0] == len(dates) !

Now you can access the t data with day/month/year objects: 现在，您可以使用日/月/年对象访问t数据：

t[np.logical_and(t.day == 1, t.month == 1)]

so for example: 例如：

for day_of_year in xrange(1, 366):
    year = np.random.randint(2006, 2011)

    t[np.logical_and(t.day_of_year == day_of_year, t.year == year)]
    # returns a [4, X, Y] array with data from that day

Play with the attributes of t to make it work with leap years too. 使用t的属性来使其与闰年一起工作。

Answer 2

I don't see a real need to reshape the array, since you can embed the year-size information in your sampling process, and leave the array with its original shape. 我不认为真正需要重塑数组，因为您可以在采样过程中嵌入年份信息，并使数组保持原始形状。

For example, you can generate a random offset (from 0 to 365), and pick the slice with index, say, n*365 + offset . 例如，您可以生成随机偏移（从0到365），并选择具有索引的切片，例如， n*365 + offset 。

Anyway, I don't think your question is complete, because I didn't quite understand what you need to do, or why. 无论如何，我不认为你的问题是完整的，因为我不太明白你需要做什么，或为什么。

采样numpy数组的最快方法是什么？

问题描述

2 个解决方案

解决方案1
3 2011-10-21 12:11:01

解决方案2
0 2011-10-21 12:34:12

采样numpy数组的最快方法是什么？

问题描述

2 个解决方案

解决方案1 3 2011-10-21 12:11:01

解决方案2 0 2011-10-21 12:34:12

解决方案1
3 2011-10-21 12:11:01

解决方案2
0 2011-10-21 12:34:12