简体   繁体   English

从距离矩阵计算亲和力矩阵

[英]Compute affinity matrix from distance matrix

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other). 我使用丛状欧米茄获得了500个蛋白质序列的距离矩阵(它们彼此同源)。

I want to use affinity propagation to cluster these sequences. 我想使用亲和力传播对这些序列进行聚类。

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity. 最初,因为我手工观察到距离矩阵只有0到1之间的值,且0距离= 100%相同性,所以我认为我可以取(1 - distance)来获得亲和力。

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". 我运行了代码,集群看起来很合理,我认为一切都很好……直到我读到通常情况下,亲和力矩阵是通过应用“热核”从距离矩阵计算得出的。 That's when all hell broke loose in my mind. 那时,我的脑海全都崩溃了。

Did I get the concept of affinity matrix incorrect? 我是否了解亲和矩阵的概念不正确? Is there an easy way of computing the affinity matrix? 有一种简单的方法可以计算亲和度矩阵吗? scikit-learn offers the following formula: scikit-learn提供以下公式:

similarity = np.exp(-beta * distance / distance.std())

But what is beta? 但是什么是Beta? I know distance.std() is the standard deviation of the distance. 我知道distance.std()是距离的标准偏差。

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated! 我现在对所涉及的概念(相对于实际的编码实现)感到困惑和迷茫,因此非常感谢您的帮助!

PS I've tried posting to Biostars.org, but I haven't gotten an answer there... 附言:我尝试过发布到Biostars.org,但是我还没有得到答案...

I think both 1-distance and exp(-beta * distance) are valid approaches to convert a distance into a similarity (though they differ in terms of their interpretation in a probabilistic framework). 我认为1-distance和exp(-beta * distance)都是将距离转换为相似度的有效方法(尽管它们在概率框架中的解释有所不同)。 I would simply use what gives the better results. 我只会使用能带来更好结果的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM