简体   繁体   English

使用numpy(或其他矢量化方法)优化此函数

[英]Optimize this function with numpy (or other vectorization methods)

I am computing with Python a classic calculation in the field of population genetics. 我用Python计算人口遗传学领域的经典计算。 I am well aware that there exists many algorithm that do the job but I wanted to build my own for some reason. 我很清楚,有很多算法可以完成这项工作,但我想出于某种原因建立自己的算法。

The below paragraph is a picture because MathJax is not supported on StackOverflow 下面的段落是一张图片,因为StackOverflow不支持MathJax

在此输入图像描述

I would like to have an efficient algorithm to calculate those Fst . 我想有一个有效的算法来计算那些Fst For the moment I only manage to make for loops and no calculations are vectorized How can I make this calculation using numpy (or other vectorization methods)? 目前我只设法制作循环并且没有计算向量化如何使用numpy(或其他向量化方法)进行计算?


Here is a code that I think should do the job: 这是我认为应该完成工作的代码:

def Fst(W, p):
    I = len(p[0])
    K = len(p)
    H_T = 0
    H_S = 0
    for i in xrange(I):
        bar_p_i = 0
        for k in xrange(K):
            bar_p_i += W[k] * p[k][i]
            H_S += W[k] * p[k][i] * p[k][i]
        H_T += bar_p_i*bar_p_i
    H_T = 1 - H_T
    H_S = 1 - H_S
    return (H_T - H_S) / H_T

def main():
    W = [0.2, 0.1, 0.2, 0.5]
    p = [[0.1,0.3,0.6],[0,0,1],[0.4,0.5,0.1],[0,0.1,0.9]]
    F = Fst(W,p)
    print("Fst = " + str(F))
    return

main()

There's no reason to use loops here. 这里没有理由使用循环。 And you really shouldn't use Numba or Cython for this stuff - linear algebra expressions like the one you have are the whole reason behind vectorized operations in Numpy. 并且你真的不应该使用Numba或Cython这样的东西 - 你所拥有的线性代数表达式是Numpy中矢量化操作背后的全部原因。

Since this type of problem is going to pop up again and again if you keep using Numpy, I would recommend getting a basic handle on linear algebra in Numpy. 由于这种类型的问题会一次又一次地弹出,如果你继续使用Numpy,我建议在Numpy中获得线性代数的基本句柄。 You might find this book chapter helpful: 您可能会发现本书章节有用:

https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html

As for your specific situation: start by creating numpy arrays from your variables: 至于你的具体情况:首先从你的变量创建numpy数组:

import numpy as np
W = np.array(W)
p = np.array(p)

Now, your \\bar p_i^2 are defined by a dot product. 现在,您的\\ bar p_i ^ 2由点积定义。 That's easy: 这很简单:

bar_p_i = p.T.dot(W)

Note the T, for the transpose, because the dot product takes the sum of the elements indexed by the last index of the first matrix and the first index of the second matrix. 注意转置的T,因为点积乘以由第一个矩阵的最后一个索引和第二个矩阵的第一个索引索引的元素的总和。 The transpose inverts the indices so the first index becomes the last. 转置会反转索引,因此第一个索引成为最后一个索引。

You H_t is defined by a sum. 你H_t是由一个总和来定义的。 That's also easy: 这也很容易:

H_T = 1 - bar_p_i.sum()

Similarly for your H_S: 同样适用于您的H_S:

H_S = 1 - ((bar_p_i**2).T.dot(W)).sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM