简体   繁体   English

按小时和星期几子集numpy数组

[英]Subsetting numpy array by hour and day of week

I have a numpy array containing millions of hourly xy points with the "columns" of the array being x, y, hour, and day of week (all ints). 我有一个包含数百万小时xy点的numpy数组,该数组的“列”为x,y,小时和星期几(所有整数)。 Here is an example of what the array looks like: 这是数组的示例:

array([[1, 2, 0, 0],
       [3, 5, 0, 0],
       [6, 3, 1, 0],
       [6, 2, 3, 0],
       [4, 3, 3, 1]])

I have created a grid of zeros that I can increment for all values in the array: 我创建了一个零位网格,可以为数组中的所有值递增:

grid = np.zeros((8,8))
for value in range(0,len(xy_new[:,1])):  
    grid[xy_new[value][1],xy_new[value][0]] += 1

but I need to be able to do this for each hour by day of week (ie Sun at hour 0, Sun at hour 1, etc.). 但我需要能够在每周的某天的每个小时执行此操作(例如,Sun在0小时,Sun在1小时等)。

How do I subset the array by hour and day of week? 如何按小时和星期几对数组进行子集化?

I have attempted modifying the answers here: Make subset of array, based on values of two other arrays in Python , Subsetting data in Python , but have not been successful. 我试图在这里修改答案: 根据Python中另外两个数组的值创建数组的 子集,在Python中对数据进行子集设置 ,但是没有成功。 Any help would be greatly appreciated!! 任何帮助将不胜感激!!

Presumably you want to wind up with 24 times 7 or 168 sets of accumulated counts for pairs of x and y . 大概您想对xy进行24乘以7或168组累计计数。 Suppose you have your data in a N by 4 array gdat . 假设您的数据放在N x 4数组gdat First, make week-hour index: 首先,制作周时指数:

whr = 24*gdat[:,2] + gdat[:,3]

You can now select the gdat rows for each hour in your week. 现在,您可以选择一周中每个小时的gdat行。 For example, for hour zero of Sunday: 例如,对于星期日的零时:

gdat0 = gdat[whr == 0]

Do whatever summing you need with gdat0 and move on to the next hour. 使用gdat0进行所需的gdat0然后继续下一个小时。

Note that unique is probably a faster way to count occurrences of x, y pairs. 请注意, unique性可能是计算x, y对出现次数的更快方法。 You can play the same game of making a composite index for x and y , but you have to know how they are bounded. 您可以玩为xy制作复合索引的相同游戏,但是您必须知道它们的界线。 Supposing x runs from 0 to 120 and y runs from 0 to 5, you could make a composite index using bit fields: 假设x是0到120,而y是0到5,则可以使用位字段创建复合索引:

xy = (gdat0[:,0] << 3) & (gdat0[:,1])

Obviously, if y has a larger range you need to shift more than 3 bits, and you may need to offset x and y to avoid negative values. 显然,如果y的范围较大,则需要移位3位以上,并且可能需要偏移xy以避免负值。

Then, use unique to return the unique values and counts for the values in xy . 然后,使用unique返回唯一值和xy的值计数。

xyval, xycnt = np.unique(xy, return_counts=True)

You then retrieve the x and y value pairs from xyval using bitwise operators, xyval >> 3 and xyval & 7 . 然后,您xyval使用xyval >> 3xyval & 7按位运算符从xyval检索xy值对。

Repeat for every hour in the week. 在一周中的每个小时重复一次。 Since storage will be an issue if N is huge, you probably want to re-use gdat0 on each iteration. 如果N很大,由于存储将成为问题,因此您可能希望在每次迭代中重用gdat0

EDIT: The short data sample you posted is time-sequential. 编辑:您发布的简短数据样本是按时间顺序的。 If all your data are time-sequential, you don't need to "select" for each hour. 如果所有数据都是按时间顺序排列的,则无需每个小时“选择”。 All you need is to find the index for each new value in whr . 您所需whr就是为whr每个新值找到索引。 unique(whr, return_index=True) will find those for you as well! unique(whr, return_index=True)也会为您找到那些!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM