從最大化最小距離的一組 3D 點中采樣 N 個點

Question

假設我有 500 個由(500, 3)數組表示的隨機 3D 點：

import numpy as np

np.random.seed(99)
points = np.random.uniform(0, 10, (500, 3))

現在我想從這 500 個點中抽取n = 20個點，使所有成對距離的最小值最大。 我正在使用一種貪婪的方法來采樣每次最大化最小距離的點。 下面是我的 Python 實現：

from scipy.spatial import distance_matrix

def sample_n_points(points, n):
    sampled_points = [points[0]]
    remained_points = points[1:]
    n_sampled = 1

    while n_sampled < n:
        min_dists = distance_matrix(remained_points, sampled_points).min(axis=1)
        imax = np.argmax(min_dists)
        sampled_points.append(remained_points[imax])
        np.delete(remained_points, (imax), axis=0)
        n_sampled += 1

    return np.asarray(sampled_points)

print(sample_n_points(points, n=20))

輸出：

[[6.72278559 4.88078399 8.25495174]
 [1.01317279 9.74063145 0.15102072]
 [5.21672436 0.39259574 0.1069965 ]
 [9.89383494 9.77095442 1.15681204]
 [0.77144184 9.99325146 9.8976312 ]
 [0.04558333 2.34842151 5.25634324]
 [9.58126175 0.57371576 5.01765991]
 [9.93010888 9.959526   9.18606297]
 [5.27648557 9.93960401 4.82093673]
 [2.97622499 0.46695721 9.90627399]
 [0.28351187 3.64220133 0.06793617]
 [6.27527665 5.58177254 0.3544929 ]
 [0.4861886  7.45547887 5.342708  ]
 [0.83203965 5.00400167 9.40102603]
 [5.21120971 2.89966623 4.24236342]
 [9.18165946 0.26450445 9.58031481]
 [5.47605481 9.4493094  9.94331621]
 [9.31058632 6.36970353 5.33362741]
 [9.47554604 2.31761252 1.53774694]
 [3.99460408 6.17908899 6.00786122]]

但是，通過使用此代碼，不能保證最佳解決方案。 我的代碼最明顯的“錯誤”是它總是從對第一個點進行采樣開始。 當然，我可以使用每個點作為起點運行我的代碼，最后采用最大化最小距離的那個，但即使這樣也不會給出最佳解決方案。 這些點在開始時彼此相距甚遠，但隨着采樣的更多點被迫彼此靠近。 經過一番思考，我意識到這個問題本質上變成了

在一組最均勻分布的 3D 點中找到子集。

我想知道是否有任何算法可以找到最佳解決方案或相對快速地給出一個好的近似值？

編輯

此優化問題的決策問題版本將是：

給定距離閾值t ，是否有可能找到 n 個點的子集，使得子集中的每對點至少相距t 。

從圖形的角度來看，這可以解釋為

在歐幾里得圖中找到一個獨立集，如果成對距離d ( v1,v2 ) ≤ t ，則點v1, v2在它們之間有一條邊。

如果我們能解決這個決策問題，那么優化問題也可以通過對閾值t進行二分搜索來解決。

Answer 1

希望我已經了解您的要求。

從你的開始：

from scipy.spatial import distance_matrix
import numpy as np

np.random.seed(99)
points = np.random.uniform(0, 10, (500, 3))

你應該得到所有點之間的距離並按距離排序：

# get distances between all points
d = distance_matrix(points, points)
# zero the identical upper triangle
dt = np.tril(d)
# list the distances and their indexes
dtv = [(dt[i, j], i, j) for (i, j) in np.argwhere(dt > 0)]
# sort the list
dtvs = sorted(dtv, key=lambda x: x[0], reverse=True)

然后，您可以增加一個set ，以獲得 20 個索引到有助於最大距離的點。

編輯以將結果限制為k唯一點索引。

kpoint_index = set()
k = 20
i = 0

for p in (j for i in dtvs for j in i[1:]):
    kpoint_index.add(p)
    if len(kpoint_index) == k:
        break

print("index to points:", kpoint_index)

給予：

index to points: {393, 11, 282, 415, 160, 302, 189, 319, 194, 453, 73, 74, 459, 335, 469, 221, 103, 232, 236, 383}

這運行得很快 - 但我沒有計時。

Answer 2

在一些富有啟發性的評論之后，我認識到即使是中等規模的問題，確切的解決方案也是難以解決的。

一種可能的近似解決方案使用 K 均值聚類。 這是一個 2D 示例，因此我可以包含一個繪圖。

np.random.seed(99)
n = 500
k = 20
pts2D = np.random.uniform(0, 10, (n, 2))

kmeans = KMeans(n_clusters=k, random_state=0).fit(pts2D)
labels = kmeans.predict(pts2D)
cntr = kmeans.cluster_centers_

現在我們可以找到離每個聚類中心最近的原始點：

# indices of nearest points to centres
approx = []

for i, c in enumerate(cntr):
    lab = np.where(labels == i)[0]
    pts = pts2D[lab]
    d = distance_matrix(c[None, ...], pts)
    idx1 = np.argmin(d, axis=1) + 1
    idx2 = np.searchsorted(np.cumsum(labels == i), idx1)[0]
    approx.append(idx2)

然后，我們可以繪制結果：

fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(pts2D[:, 0], pts2D[:, 1], '.')
ax.plot(cntr[:, 0], cntr[:, 1], 'x')
ax.plot(pts2D[approx, 0], pts2D[approx, 1], 'r.')
ax.set_aspect("equal")
fig.legend(["points", "centres", "selected"], loc=1)

最后，如果實際點始終均勻分布，您可以通過均勻放置“中心”並選擇離每個點最近的點來獲得一個很好的近似值。 那么就不需要K-means了。

Answer 3

不確定這是否是您想要的，但據我了解，您可能需要三角形平鋪

三角形平鋪圖像

從最大化最小距離的一組 3D 點中采樣 N 個點

問題描述

2 個解決方案

解決方案1
2 2021-09-15 16:21:37

解決方案2
1 已采納 2021-09-17 15:06:31

解決方案3
0 2022-07-12 02:07:28

從最大化最小距離的一組 3D 點中采樣 N 個點

問題描述

2 個解決方案

解決方案1 2 2021-09-15 16:21:37

解決方案2 1 已采納 2021-09-17 15:06:31

解決方案3 0 2022-07-12 02:07:28

解決方案1
2 2021-09-15 16:21:37

解決方案2
1 已采納 2021-09-17 15:06:31

解決方案3
0 2022-07-12 02:07:28