简体   繁体   English

从 label python 列表创建关联矩阵的快速方法?

[英]Fast way to create incidence matrix from list of label python?

I have an array y, len(y) = M that contains values from 0 -> N .我有一个数组y, len(y) = M ,其中包含0 -> N的值。 For example, with N = 3 :例如,当N = 3时:

y = [0, 2, 0, 1, 2, 1, 0, 2]

Incidence matrix A is defined as followed:发生矩阵A定义如下:

  • Size MxM尺寸MxM
  • A(i,j) = 1 if y(i) == y(j)
  • A(i,j) = 0 if y(i) != y(j)

A simple algorithm would be:一个简单的算法是:

def incidence(y):
    M = len(y)
    A = np.zeros((M,M))
    for i in range(M):
        for j in range(M):
            if y[i]==y[j]:
                A[i,j] = 1
    return A

But this is very slow.但这非常慢。 Is there any way to do this faster?有什么办法可以更快地做到这一点? Using list comprehension or vectorization, for example.例如,使用列表理解或向量化。

You can take advantage of numpy broadcasting to gain some efficiency here over our python by simply asking if y equals its transpose:您可以利用 numpy 广播在我们的 python 上获得一些效率,只需询问y是否等于它的转置:

import numpy as np

y = np.array([1, 2, 1, 0, 0, 1, 2])

def mat_me(y):
    return (y == y.reshape(-1, 1)).astype(int)

mat_me(y)

which produces:产生:

array([[1, 0, 1, 0, 0, 1, 0],
       [0, 1, 0, 0, 0, 0, 1],
       [1, 0, 1, 0, 0, 1, 0],
       [0, 0, 0, 1, 1, 0, 0],
       [0, 0, 0, 1, 1, 0, 0],
       [1, 0, 1, 0, 0, 1, 0],
       [0, 1, 0, 0, 0, 0, 1]])

for comparison:为了比较:

y = np.random.choice([1, 2, 3], size=3000)

def mat_me_py(y):
    return (y == y.reshape([-1, 1])).astype(int)

%timeit mat_me_py(y)  
# 28.6 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

vs.对比

y = np.random.choice([1, 2, 3], size=3000)
y = list(y)

def mat_me_py(y):
    return [[int(a == b) for a in y] for b in y]

%timeit mat_me_py(y)
# 4.16 s ± 213 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The difference will become very pronounced on larger lists.在较大的列表中,差异将变得非常明显。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM