简体   繁体   English

在numpy中正确执行以下操作的正确方法

[英]Proper way to get right the following action in numpy

I have an interesting puzzle. 我有一个有趣的难题。 Suppose you have a numpy 2D array, in which each line corresponds to a measurement event and each column corresponds to different measured variable. 假设您有一个numpy 2D数组,其中每行对应一个测量事件,每列对应不同的测量变量。 One additional column in this array specifies the date at which the measurement was taken. 此数组中的另一列指定了进行测量的日期。 The lines are sorted according to the time stamp. 这些行根据时间戳进行排序。 There are several (or many) measurements on each day. 每天有几次(或多次)测量。 The goal is to identify the lines that correspond to a new day and subtract the respective values from the subsequent lines in that day. 目的是识别与新的一天相对应的行,并从该天的后续行中减去相应的值。

I approach this problem by a loop that loops over the days, creating a boolean vector that selects the proper lines and then subtracting the first selected line. 我通过一个循环遍历几天的循环来解决这个问题,创建一个布尔向量来选择适当的行,然后减去第一条选定的行。 This approach works, but feels non-elegant. 这种方法有效,但感觉并不优雅。 Are there better ways to do this? 有更好的方法可以做到这一点吗?

Just a small example. 只是一个小例子。 The lines below define a matrix in which the first colum is the day and the remaining two are the measured values 下面的线定义了一个矩阵,其中第一个列是日期,其余两个是测量值

before = array([[ 1,  1,  2],
   [ 1,  3,  4],
   [ 1,  5,  6],
   [ 2,  7,  8],
   [ 3,  9, 10],
   [ 3, 11, 12],
   [ 3, 13, 14]])

at the end of the process I expect to see the following array: 在该过程结束时,我希望看到以下数组:

array([[1, 0, 0],
   [1, 2, 2],
   [1, 4, 4],
   [2, 0, 0],
   [3, 0, 0],
   [3, 2, 2],
   [3, 4, 4]])

PS Please help me finding a better and more informative title for this post. 附言:请帮助我为这篇文章找到更好,更翔实的标题。 I'm out of ideas 我没主意了

numpy.searchsorted is a convenient function for this: numpy.searchsorted是一个方便的功能:

In : before
Out:
array([[ 1,  1,  2],
       [ 1,  3,  4],
       [ 1,  5,  6],
       [ 2,  7,  8],
       [ 3,  9, 10],
       [ 3, 11, 12],
       [ 3, 13, 14]])

In : diff = before[before[:,0].searchsorted(x[:,0])]

In : diff[:,0] = 0

In : before - diff
Out:
array([[1, 0, 0],
       [1, 2, 2],
       [1, 4, 4],
       [2, 0, 0],
       [3, 0, 0],
       [3, 2, 2],
       [3, 4, 4]])

Longer explanation 更长的解释

If you take the first column, and search for itself you get the minimum indices for those particular values: 如果您选择第一列并进行搜索,则会获得这些特定值的最小索引:

In : before
Out:
array([[ 1,  1,  2],
       [ 1,  3,  4],
       [ 1,  5,  6],
       [ 2,  7,  8],
       [ 3,  9, 10],
       [ 3, 11, 12],
       [ 3, 13, 14]])

In : before[:,0].searchsorted(x[:,0])
Out: array([0, 0, 0, 3, 4, 4, 4])

You can then use this to construct the matrix that you will subtract by indexing: 然后,您可以使用它来构建要通过索引减去的矩阵:

In : diff = before[before[:,0].searchsorted(x[:,0])]

In : diff
Out:
array([[ 1,  1,  2],
       [ 1,  1,  2],
       [ 1,  1,  2],
       [ 2,  7,  8],
       [ 3,  9, 10],
       [ 3,  9, 10],
       [ 3,  9, 10]])

You need to make the first column 0 so that they won't be subtracted. 您需要将第一列设置为0以免被减去。

In : diff[:,0] = 0

In : diff
Out:
array([[ 0,  1,  2],
       [ 0,  1,  2],
       [ 0,  1,  2],
       [ 0,  7,  8],
       [ 0,  9, 10],
       [ 0,  9, 10],
       [ 0,  9, 10]])

Finally, subtract two matrices to get the desired output: 最后,减去两个矩阵以获得所需的输出:

In : before - diff
Out:
array([[1, 0, 0],
       [1, 2, 2],
       [1, 4, 4],
       [2, 0, 0],
       [3, 0, 0],
       [3, 2, 2],
       [3, 4, 4]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM