简体   繁体   English

沿 2 numpy 数组之间的行组合查找组合事件

[英]Finding combinatorial occurrences along combination of rows between 2 numpy array

I am trying to find a fast vectorized (at least partially) solution finding combinatorial occurrence between two 2D numpy array to identified Single Point Polymorphism linkage.我试图找到一个快速矢量化(至少部分)解决方案,发现两个 2D numpy 数组之间的组合出现以识别单点多态性链接。 The shape of each array is (factors, samples) an example for matrix 1 is as follows:每个数组的形状是(因子,样本)矩阵1的示例如下:

array([[0., 1., 1.],
       [1., 0., 1.]])

and matrix 2和矩阵 2

array([[1., 1., 0.],
       [0., 0., 0.]])

I need to find the total number of occurrence along samples axis for each permutation of 2 factors at the same position of 2 matrix (order matters because (1,0) count is different from (0,1) count).我需要在 2 个矩阵的相同 position 处找到 2 个因子的每个排列沿样本轴出现的总数(顺序很重要,因为 (1,0) 计数不同于 (0,1) 计数)。 Therefore the combinations should be [(0, 0), (0, 1), (1, 0), (1, 1)] and the final output is (factor, factor) for counts of each occurrence.因此,组合应该是 [(0, 0), (0, 1), (1, 0), (1, 1)] 并且最终的 output 是(因子,因子)对于每次出现的计数。

For combination (0,0) for instance, we get the matrix例如,对于组合 (0,0),我们得到矩阵

array([[0, 1],
       [0., 1]])

Because 0 counts (0,0) along row 0 of matrix 1 & row 0 of matrix 2, 1 along row 0 of matrix 1 & row 1 of matrix 2, 0 along row 1 of matrix 1 & row 0 of matrix 2, 1 along row 1 of matrix 1 & row 1 of matrix 2,因为 0 沿着矩阵 1 的第 0 行和矩阵 2 的第 0 行计数 (0,0),1 沿着矩阵 1 的第 0 行和矩阵 2 的第 1 行,0 沿着矩阵 1 的第 1 行和矩阵 2 的第 0 行,1沿着矩阵 1 的第 1 行和矩阵 2 的第 1 行,

With example data使用示例数据

import numpy as np

array1 = np.array([
        [0., 1., 1.],
        [1., 0., 1.]])
array2 = np.array([
        [1., 1., 0.],
        [0., 0., 0.]])

We can count the desired combinations with np.einsum and reshape to a suitable array我们可以使用np.einsum计算所需的组合并reshape为合适的数组

c1 = np.array([1-array1, array1]).astype('int')
c2 = np.array([1-array2, array2]).astype('int')
np.einsum('ijk,lmk->iljm', c1, c2).reshape(-1, len(array1), len(array2))

Output Output

array([[[0, 1],    # counts for (0,0)
        [0, 1]],

       [[1, 0],    # counts for (0,1)
        [1, 0]],

       [[1, 2],    # counts for (1,0)
        [1, 2]],

       [[1, 0],    # counts for (1,1)
        [1, 0]]])

Checking that the previous results are equal to dot products检查先前的结果是否等于点积

import itertools as it

np.array([x @ y.T for x, y in it.product(c1, c2)])

Output Output

array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]],

       [[1, 2],
        [1, 2]],

       [[1, 0],
        [1, 0]]])

Since I realized the solution while trying to derive a manual example for the question, I will just provide that we should solve these by dot products:由于我在尝试为该问题导出手动示例时意识到了解决方案,因此我将只提供我们应该通过点积解决这些问题:

matrix1_0 = (array1[0]==0).astype('int')
matrix1_1 = (array1[0]==1).astype('int')
matrix2_0 = (array2[1]==0).astype('int')
matrix2_1 = (array2[1]==1).astype('int')

count_00 = np.dot(matrix1_0 , matrix2_0.T)
count_01  = np.dot(matrix1_0 , matrix2_1.T)
count_10  = np.dot(matrix1_1 , matrix2_0.T)
count_11  = np.dot(matrix1_1 , matrix2_1.T)

These would correspond to sum of number of occurrence for each combination for each factor along a certain axis (sample axis 1 here).这些将对应于沿特定轴(此处为样本轴 1)的每个因素的每个组合的出现次数总和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM