简体   繁体   English

2个时间相关的多维信号(信号向量)的相关性

[英]Correlation of 2 time dependent multidimensional signals (signal vectors)

I have a matrix M1 , each row of which is a time-dependent signal. 我有一个矩阵M1,每行都是一个与时间有关的信号。

And I have another matrix, M2, of the same dimensions, each row of which is also a time dependent signal, used as a "template", to recognize signalshapes in the first matrix. 我有另一个相同尺寸的矩阵M2,其中每一行也是一个时间相关信号,用作“模板”,用于识别第一个矩阵中的信号形状。

I want as a result a column vector v, with v [i] is the correllation between the i'th row of M1 and the i'th row of M2. 作为结果,我想要列向量v,其中v [i]是M1的第i行和M2的第i行之间的相关性。

I've looked into the corrcoef function of numpy and tried the following code: 我已经研究了numpy的corrcoef函数并尝试了以下代码:

import numpy as np

M1 = np.array ([
    [1, 2, 3, 4],
    [2, 3, 1, 4]
])

M2 = np.array ([
    [10, 20, 30, 40],
    [20, 30, 10, 40]
])

print (np.corrcoef (M1, M2))

which prints: 打印:

[[ 1.   0.4  1.   0.4]
 [ 0.4  1.   0.4  1. ]
 [ 1.   0.4  1.   0.4]
 [ 0.4  1.   0.4  1. ]]

I've been reading the docs, but I am still confused as to which entries of this matrix I have to pick as entries of my vector v. 我一直在阅读文档,但我仍然感到困惑的是,这个矩阵的哪些条目我必须选择作为我的向量v的条目。

Can anyone help? 有人可以帮忙吗?

(I've studied several SO answers to similar questions, but didn't yet see the light...) (我已经研究了类似问题的几个SO答案,但还没有看到光......)

Code context: 代码上下文:

There are 256 rows (signals), and I run a sliding window of 200 samples over the 'main signal', which has a lenght of 10k samples. 有256行(信号),我在'主信号'上运行200个样本的滑动窗口,其中有10k个样本的长度。 So M1 and M2 are both 256 rows x 200 columns. 因此M1和M2都是256行×200列。 Sorry for the erroneous 10k samples. 抱歉错误的10k样品。 That's the total signal length. 这是总信号长度。 By using correlation with a sliding template I try to find the offsets where the template matches best. 通过使用与滑动模板的相关性,我尝试找到模板匹配最佳的偏移量。 Actually I am looking for QRS complexes in a 256 channel invasive cardiogram (or rather, electrogram, as physicians call it). 实际上我正在寻找256通道侵入性心电图中的QRS复合波(或者更确切地说,电图,就像医生所说的那样)。

    lg.info ('Processor: {}, time: {}, markers: {}'.format (self.key, dt.datetime.now ().time (), len (self.data.markers)))

    # Compute average signal shape over preexisting markers and uses that as a template to find the others.
    # All generated markers will have the width of the widest preexisting one.

    template = np.zeros ((self.data.samples.shape [0], self.bufferWidthSteps))

    # Add intervals that were marked in advance
    nrOfTerms = 0
    maxWidthSteps = 0
    newMarkers = []
    for marker in self.data.markers:
        if marker.key == self.markerKey:

            # Find start and stop sample index    
            startIndex = marker.tSteps - marker.stampWidthSteps // 2
            stopIndex = marker.tSteps + marker.stampWidthSteps // 2

            # Extract relevant slice from samples and add it to template
            template += np.hstack ((self.data.samples [ : , startIndex : stopIndex], np.zeros ((self.data.samples.shape [0], self.bufferWidthSteps - marker.stampWidthSteps))))

            # Adapt nr of added terms to facilitate averaging
            nrOfTerms += 1

            # Remember maximum width of previously marked QRS complexes
            maxWidthSteps = max (maxWidthSteps, marker.stampWidthSteps)
        else:
            # Preexisting markers with non-matching keys are just copied to the new marker list
            # Preexisting markers with a matching key are omitted from the new marker list
            newMarkers.append (marker)

    # Compute average of intervals that were marked in advance
    template = template [ : , 0 : maxWidthSteps] / nrOfTerms
    halfWidthSteps = maxWidthSteps // 2

    # Append markers of intervals that yield an above threshold correlation with the averaged marked intervals
    firstIndex = 0
    stopIndex = self.data.samples.shape [1] - maxWidthSteps
    while firstIndex < stopIndex:
        corr = np.corrcoef (
            template,
            self.data.samples [ : , firstIndex : firstIndex + maxWidthSteps]
        )

        diag = np.diagonal (
            corr,
            template.shape [0]
        )

        meanCorr = np.mean (diag)

        if meanCorr > self.correlationThreshold:
            newMarkers.append ([self.markerFactories [self.markerKey] .make (firstIndex + halfWidthSteps, maxWidthSteps)])

            # Prevent overlapping markers
            firstIndex += maxWidthSteps
        else:
            firstIndex += 5

    self.data.markers = newMarkers

    lg.info ('Processor: {}, time: {}, markers: {}'.format (self.key, dt.datetime.now ().time (), len (self.data.markers)))

Based on this solution to finding correlation matrix between two 2D arrays, we can have a similar one for finding correlation vector that computes correlation between corresponding rows in the two arrays. 基于this solution来找到两个2D阵列之间的相关矩阵,我们可以具有类似的用于找到相关矢量以计算两个阵列中的相应行之间的相关性。 The implementation would look something like this - 实现看起来像这样 -

def corr2_coeff_rowwise(A,B):
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(1)[:,None]
    B_mB = B - B.mean(1)[:,None]

    # Sum of squares across rows
    ssA = (A_mA**2).sum(1);
    ssB = (B_mB**2).sum(1);

    # Finally get corr coeff
    return np.einsum('ij,ij->i',A_mA,B_mB)/np.sqrt(ssA*ssB)

We can further optimize the part to get ssA and ssB by introducing einsum magic there too! 我们可以通过在那里引入einsum魔法来进一步优化部件以获得ssAssB

def corr2_coeff_rowwise2(A,B):
    A_mA = A - A.mean(1)[:,None]
    B_mB = B - B.mean(1)[:,None]
    ssA = np.einsum('ij,ij->i',A_mA,A_mA)
    ssB = np.einsum('ij,ij->i',B_mB,B_mB)
    return np.einsum('ij,ij->i',A_mA,B_mB)/np.sqrt(ssA*ssB)

Sample run - 样品运行 -

In [164]: M1 = np.array ([
     ...:     [1, 2, 3, 4],
     ...:     [2, 3, 1, 4.5]
     ...: ])
     ...: 
     ...: M2 = np.array ([
     ...:     [10, 20, 33, 40],
     ...:     [20, 35, 15, 40]
     ...: ])
     ...: 

In [165]: corr2_coeff_rowwise(M1, M2)
Out[165]: array([ 0.99411402,  0.96131896])

In [166]: corr2_coeff_rowwise2(M1, M2)
Out[166]: array([ 0.99411402,  0.96131896])

Runtime test - 运行时测试 -

In [97]: M1 = np.random.rand(256,200)
    ...: M2 = np.random.rand(256,200)
    ...: 

In [98]: out1 = np.diagonal (np.corrcoef (M1, M2), M1.shape [0])
    ...: out2 = corr2_coeff_rowwise(M1, M2)
    ...: out3 = corr2_coeff_rowwise2(M1, M2)
    ...: 

In [99]: np.allclose(out1, out2)
Out[99]: True

In [100]: np.allclose(out1, out3)
Out[100]: True

In [101]: %timeit np.diagonal (np.corrcoef (M1, M2), M1.shape [0])
     ...: %timeit corr2_coeff_rowwise(M1, M2)
     ...: %timeit corr2_coeff_rowwise2(M1, M2)
     ...: 
100 loops, best of 3: 9.5 ms per loop
1000 loops, best of 3: 554 µs per loop
1000 loops, best of 3: 430 µs per loop

20x+ speedup there with einsum over the built-in np.corrcoef ! 在内置的einsum上有20x+加速np.corrcoef

I think it's this: (please correct if wrong!) 我想是这样的:( 如果错了请更正!)

import numpy as np

M1 = np.array ([
    [1, 2, 3, 4],
    [2, 3, 1, 4.5]
])

M2 = np.array ([
    [10, 20, 33, 40],
    [20, 35, 15, 40]
])

v = np.diagonal (np.corrcoef (M1, M2), M1.shape [0])

print (v)

Which prints: 哪个印刷品:

[ 0.99411402  0.96131896]

Since it's got only one dimension, I can think of it as a column-vector... 由于它只有一个维度,我可以将其视为列向量...

not knowing enough of numpy array magic, I'd just pick out the rows, feed each pair individually to corrcoeff 我不太了解那些笨拙的阵列魔法,我只是选出行,将每一对分别送到corrcoeff

[np.corrcoef(i,j)[0][1] for i,j in zip(a,b)]

for a np.array column output 对于np.array列输出

c, c.shape = np.array([np.corrcoef(i,j)[0][1] for i,j in zip(a,b)]), (a.shape[0], 1)

I'm sure there's better using numpy broadcast/indexing features 我确信使用numpy广播/索引功能会更好

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM