简体   繁体   English

通过第一列比较两个numpy数组,并通过连接两个数组创建第三个numpy数组

[英]Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays

I have two 2d numpy arrays which is used to plot simulation results. 我有两个2d numpy数组,用于绘制模拟结果。

The first column of both arrays a and b contains the time intervals and the second column contains the data to be plotted. 数组ab的第一列包含时间间隔,第二列包含要绘制的数据。 The two arrays have different shapes a(500,2) b(600,2) . 这两个数组具有不同的形状a(500,2) b(600,2) I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a . 我想第一列这两个numpy的阵列比较和创建上的第一列中找到相匹配的第三组a If no match is found add 0 to third column. 如果找不到匹配项,则将0添加到第三列。

Is there any numpy trick to do this? 有什么numpy技巧可以做到这一点吗?

For instance: 例如:

a=[[0.002,0.998],  
  [0.004,0.997],   
  [0.006,0.996],   
  [0.008,0.995],   
  [0.010,0.993]]   

b= [[0.002,0.666],  
    [0.004,0.665],  
    [0.0041,0.664], 
    [0.0042,0.664], 
    [0.0043,0.664], 
    [0.0044,0.663], 
    [0.0045,0.663], 
    [0.0005,0.663], 
    [0.006,0.663], 
    [0.0061,0.662],
    [0.008,0.661]] 

expected output 预期产量

c= [[0.002,0.998,0.666],       
    [0.004,0.997,0.665],           
    [0.006,0.996,0.663],           
    [0.008,0.995,0.661],
    [0.010,0.993, 0   ]]  

I can quickly think of the solution as 我可以很快将解决方案视为

import numpy as np

a = np.array([[0.002, 0.998],
     [0.004, 0.997],
     [0.006, 0.996],
     [0.008, 0.995],
     [0.010, 0.993]])

b = np.array([[0.002, 0.666],
     [0.004, 0.665],
     [0.0041, 0.664],
     [0.0042, 0.664],
     [0.0043, 0.664],
     [0.0044, 0.663],
     [0.0045, 0.663],
     [0.0005, 0.663],
     [0.0006, 0.663],
     [0.00061, 0.662],
     [0.0008, 0.661]])


c = []
for row in a:
    index = np.where(b[:,0] == row[0])[0]
    if np.size(index) != 0:
      c.append([row[0], row[1], b[index[0], 1]])
    else:
      c.append([row[0], row[1], 0])

print c

As pointed out in the comments above, there seems to be a data entry error 如上面的评论所指出,似乎存在数据输入错误

import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])

This works by taking the intersection of the first column of a and b using intersect1d , and then using in1d to cross-reference that intersection with the second columns. 通过使用intersect1d获取 ab的第一列的交集 ,然后使用in1d与第二列的交集进行交叉引用,可以实现此目的。

vstack stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation). vstack垂直堆叠输入的元素,并且需要转置以获得正确的尺寸(非常快速的操作)。

Then find times in a that are not in b using setdiff1d , and complete the result by putting 0s in the third column. 然后找到在时间a不在b使用setdiff1d ,并通过把0在第三列完成的结果。

This prints out 打印出来

array([[ 0.002,  0.998,  0.666],
       [ 0.004,  0.997,  0.665],
       [ 0.006,  0.996,  0.   ],
       [ 0.008,  0.995,  0.   ],
       [ 0.01 ,  0.993,  0.   ]])

The following works both for numpy arrays and simple python lists. 以下内容适用于numpy数组和简单的python列表。

c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)

Someone braver than I am could try to make this one line. 比我勇敢的人可以尝试做这一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM