将距离矩阵转换为邻接表

Question

我有一个 12000 个原子之间的距离矩阵（成对欧几里德距离）。 我想将其转换为节点邻接列表，列表的第 i 个元素是阈值距离内的节点列表。 例如，如果我有三点：

(0,0) (1,0) (1,1)

我将有矩阵：

[[0.         1.         1.41421356]
 [1.         0.         1.        ]
 [1.41421356 1.         0.        ]]

然后是所有满足条件的对； 距离 <= 1; 将会：

[[0 0]
 [0 1]
 [1 0]
 [1 1]
 [1 2]
 [2 1]
 [2 2]]

然后最后邻接列表将是：

[[0,1],[0,1,2],[1,2]]

这是一个有效的代码：

from scipy.spatial import distance
import numpy as np

def voisinage(xyz):
    #xyz is a vector of positions in 3d space

    # matrice de distance
    dist = distance.cdist(xyz,xyz,'euclidean')

    # extract i,j pairs where distance < threshold
    paires = np.argwhere(dist<threshold)

    # prepare the adjacency list
    Vvoisinage = [[] for i in range(len(xyz))]

    # fill the adjacency list
    for p in paires:
        Vvoisinage[p[0]].append(p[1])

此代码在 3D 空间中运行大约 12100 个点大约需要 4 到 5 秒。 我想让它尽可能快，因为它需要运行数千组 12100 点，并且每组还有其他计算。 我曾尝试使用 networkX，但它比这种方法慢得多。

要优化的部分是最后一个，因为它平均需要 2.7 秒，因此计算时间减半。

此外，也许有一种更快的方法来完成所有这些工作。

谢谢

Answer 1

正如评论中所讨论的，一种廉价的优化是比较距离的平方。 这给出了相同的结果并避免了提取平方根。
正如@inarighas 所讨论的，计算完整矩阵是多余的。 您可以避免计算前导对角线和整个上三角形。 这将使你的表现翻倍。
如果你对性能感兴趣，你不应该使用像 python 这样的解释型语言。 诸如 C++ 之类的编译语言可以为您带来高达 50 倍的性能提升。 这是您的问题的 C++ 代码。

    class cLoc
    {
    public:
        float x, y, z;
        cLoc(float X, float Y, float Z)
            : x(X), y(Y), z(Z)
        {
        }
        float dist2(const cLoc &other) const
        {
            float dx = x - other.x;
            float dy = y - other.y;
            float dz = z - other.z;
            return dx *dx + dy *dy + dz *dz;
        }
    };
    
    class cAtoms
    {
    public:
        std::vector<cLoc> myAtomLocs;
        std::vector<std::vector<int>> myClose;
        float myMaxDist2;
    
        void generateTest1();
        void neighbors();
        std::string text();
    };
    
    void cAtoms::generateTest1()
    {
        myAtomLocs = {
            cLoc(0, 0, 0),
            cLoc(1, 0, 0),
            cLoc(1, 1, 0)};
        myMaxDist2 = 1;
    }
    
    void cAtoms::neighbors()
    {
        for (int ks = 0; ks < myAtomLocs.size(); ks++)
        {
            std::vector<int> v;
            v.push_back(ks);
            for (int kd = ks + 1; kd < myAtomLocs.size(); kd++)
            {
                if (myAtomLocs[ks].dist2(myAtomLocs[kd]) <= myMaxDist2)
                    v.push_back(kd);
            }
            myClose.push_back(v);
        }
    }
    
    std::string cAtoms::text()
    {
        std::stringstream ss;
        for( auto& v : myClose ) {
            if( !v.size() )
                continue;
            ss << v[0] <<": ";
            for( int k = 1; k < v.size(); k++ )
            {
                ss << v[k] << " ";
            }
            ss << "\n";
        }
        return ss.str();
    }
    main()
    {
            cAtoms myAtoms;
            myAtoms.generateTest1();
            myAtoms.neighbors();
            std::cout << myAtoms.text();
    }

输出是

0: 1 
1: 2

如果我如下生成 12,100 个原子

 void cAtoms::generateRandom()
 {
    srand(time(NULL));

    for( int k = 0; k < 12100; k++ ) {
        myAtomLocs.push_back( cLoc(
            (rand() % 100 ) / 10.0f,
            (rand() % 100 ) / 10.0f,
            (rand() % 100 ) / 10.0f
           ));
    }
    myMaxDist2 = 1;
 }

然后，neighbors() 方法在我的笔记本电脑上运行了 125 毫秒。

完整应用程序的代码位于https://github.com/JamesBremner/atoms

Answer 2

首先，距离矩阵的对角线不是那么有用，因为它总是等于零。 为了让你的过程更快，我只使用了 numpy 函数，因为它们在处理数组和矩阵时通常比普通的 python 列表操作更快。

所以首先通过设置为np.nan忽略了dist矩阵对角线然后，我按paires一个索引对对进行了分组（请参阅Is there any numpy group by function? ）。

这是我的代码：

from scipy.spatial import distance
import numpy as np

xyz = np.array([[0,0],[1,0],[1,1]])

threshold = 1

# distance matrix
dist = distance.cdist(xyz,xyz,'euclidean')

# ignore diagonal values
np.fill_diagonal(dist, np.nan)

# extract i,j pairs where distance < threshold
paires = np.argwhere(dist<=threshold)

# groupby index
tmp = np.unique(paires[:, 0], return_index=True)
neighbors = np.split(paires[:,1], tmp[1])[1:]
indices = tmp[0]

输出对应于一个列表列表，例如每个列表对应于与索引对应的节点相邻的节点。

就定量性能而言（在我的计算机 ofc 上），您的函数在 12000 个随机生成的点上需要约 4.5 秒，而我的需要约 1.3 秒。

将距离矩阵转换为邻接表

问题描述

2 个解决方案

解决方案1
2 2022-06-21 16:13:08

解决方案2
1 已采纳 2022-06-21 14:26:11

将距离矩阵转换为邻接表

问题描述

2 个解决方案

解决方案1 2 2022-06-21 16:13:08

解决方案2 1 已采纳 2022-06-21 14:26:11

解决方案1
2 2022-06-21 16:13:08

解决方案2
1 已采纳 2022-06-21 14:26:11