[英]Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays
I have two 2d numpy arrays which is used to plot simulation results. 我有两个2d numpy数组,用于绘制模拟结果。
The first column of both arrays a
and b
contains the time intervals and the second column contains the data to be plotted. 数组a
和b
的第一列包含时间间隔,第二列包含要绘制的数据。 The two arrays have different shapes a(500,2)
b(600,2)
. 这两个数组具有不同的形状a(500,2)
b(600,2)
。 I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a
. 我想第一列这两个numpy的阵列比较和创建上的第一列中找到相匹配的第三组a
。 If no match is found add 0 to third column. 如果找不到匹配项,则将0添加到第三列。
Is there any numpy trick to do this? 有什么numpy技巧可以做到这一点吗?
For instance: 例如:
a=[[0.002,0.998],
[0.004,0.997],
[0.006,0.996],
[0.008,0.995],
[0.010,0.993]]
b= [[0.002,0.666],
[0.004,0.665],
[0.0041,0.664],
[0.0042,0.664],
[0.0043,0.664],
[0.0044,0.663],
[0.0045,0.663],
[0.0005,0.663],
[0.006,0.663],
[0.0061,0.662],
[0.008,0.661]]
expected output 预期产量
c= [[0.002,0.998,0.666],
[0.004,0.997,0.665],
[0.006,0.996,0.663],
[0.008,0.995,0.661],
[0.010,0.993, 0 ]]
I can quickly think of the solution as 我可以很快将解决方案视为
import numpy as np
a = np.array([[0.002, 0.998],
[0.004, 0.997],
[0.006, 0.996],
[0.008, 0.995],
[0.010, 0.993]])
b = np.array([[0.002, 0.666],
[0.004, 0.665],
[0.0041, 0.664],
[0.0042, 0.664],
[0.0043, 0.664],
[0.0044, 0.663],
[0.0045, 0.663],
[0.0005, 0.663],
[0.0006, 0.663],
[0.00061, 0.662],
[0.0008, 0.661]])
c = []
for row in a:
index = np.where(b[:,0] == row[0])[0]
if np.size(index) != 0:
c.append([row[0], row[1], b[index[0], 1]])
else:
c.append([row[0], row[1], 0])
print c
As pointed out in the comments above, there seems to be a data entry error 如上面的评论所指出,似乎存在数据输入错误
import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])
This works by taking the intersection of the first column of a
and b
using intersect1d , and then using in1d to cross-reference that intersection with the second columns. 通过使用intersect1d获取 a
和b
的第一列的交集 ,然后使用in1d与第二列的交集进行交叉引用,可以实现此目的。
vstack
stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation). vstack
垂直堆叠输入的元素,并且需要转置以获得正确的尺寸(非常快速的操作)。
Then find times in a
that are not in b
using setdiff1d , and complete the result by putting 0s in the third column. 然后找到在时间a
不在b
使用setdiff1d ,并通过把0在第三列完成的结果。
This prints out 打印出来
array([[ 0.002, 0.998, 0.666],
[ 0.004, 0.997, 0.665],
[ 0.006, 0.996, 0. ],
[ 0.008, 0.995, 0. ],
[ 0.01 , 0.993, 0. ]])
The following works both for numpy arrays and simple python lists. 以下内容适用于numpy数组和简单的python列表。
c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)
Someone braver than I am could try to make this one line. 比我勇敢的人可以尝试做这一行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.