[英]Fast calculation of Pareto front in Python
我在 3D 空間中有一組點,我需要從中找到帕累托邊界。 執行速度在這里非常重要,隨着我添加測試點,時間增加得非常快。
點集如下所示:
[[0.3296170319979843, 0.0, 0.44472108843537406], [0.3296170319979843,0.0, 0.44472108843537406], [0.32920760896951373, 0.0, 0.4440408163265306], [0.32920760896951373, 0.0, 0.4440408163265306], [0.33815192743764166, 0.0, 0.44356462585034007]]
現在,我正在使用這個算法:
def dominates(row, candidateRow):
return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row)
def simple_cull(inputPoints, dominates):
paretoPoints = set()
candidateRowNr = 0
dominatedPoints = set()
while True:
candidateRow = inputPoints[candidateRowNr]
inputPoints.remove(candidateRow)
rowNr = 0
nonDominated = True
while len(inputPoints) != 0 and rowNr < len(inputPoints):
row = inputPoints[rowNr]
if dominates(candidateRow, row):
# If it is worse on all features remove the row from the array
inputPoints.remove(row)
dominatedPoints.add(tuple(row))
elif dominates(row, candidateRow):
nonDominated = False
dominatedPoints.add(tuple(candidateRow))
rowNr += 1
else:
rowNr += 1
if nonDominated:
# add the non-dominated point to the Pareto frontier
paretoPoints.add(tuple(candidateRow))
if len(inputPoints) == 0:
break
return paretoPoints, dominatedPoints
在這里找到:http: //code.activestate.com/recipes/578287-multidimensional-pareto-front/
在解決方案集合中找到一組非支配解決方案的最快方法是什么? 或者,簡而言之,Python 能比這個算法做得更好嗎?
如果您擔心實際速度,您肯定想使用 numpy(因為巧妙的算法調整可能比使用數組操作所獲得的收益小得多)。 以下是三個都計算相同函數的解決方案。 is_pareto_efficient_dumb
解決方案在大多數情況下較慢,但隨着成本數量的增加而變得更快, is_pareto_efficient_simple
解決方案在許多點上比啞解決方案高效得多,最終的is_pareto_efficient
函數可讀性較差但速度最快(所以都是帕累托高效的!)。
import numpy as np
# Very slow for many datapoints. Fastest for many costs, most readable
def is_pareto_efficient_dumb(costs):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
is_efficient[i] = np.all(np.any(costs[:i]>c, axis=1)) and np.all(np.any(costs[i+1:]>c, axis=1))
return is_efficient
# Fairly fast for many datapoints, less fast for many costs, somewhat readable
def is_pareto_efficient_simple(costs):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
if is_efficient[i]:
is_efficient[is_efficient] = np.any(costs[is_efficient]<c, axis=1) # Keep any point with a lower cost
is_efficient[i] = True # And keep self
return is_efficient
# Faster than is_pareto_efficient_simple, but less readable.
def is_pareto_efficient(costs, return_mask = True):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:param return_mask: True to return a mask
:return: An array of indices of pareto-efficient points.
If return_mask is True, this will be an (n_points, ) boolean array
Otherwise it will be a (n_efficient_points, ) integer array of indices.
"""
is_efficient = np.arange(costs.shape[0])
n_points = costs.shape[0]
next_point_index = 0 # Next index in the is_efficient array to search for
while next_point_index<len(costs):
nondominated_point_mask = np.any(costs<costs[next_point_index], axis=1)
nondominated_point_mask[next_point_index] = True
is_efficient = is_efficient[nondominated_point_mask] # Remove dominated points
costs = costs[nondominated_point_mask]
next_point_index = np.sum(nondominated_point_mask[:next_point_index])+1
if return_mask:
is_efficient_mask = np.zeros(n_points, dtype = bool)
is_efficient_mask[is_efficient] = True
return is_efficient_mask
else:
return is_efficient
分析測試(使用從正態分布中提取的點):
10,000 個樣本,2 個成本:
is_pareto_efficient_dumb: Elapsed time is 1.586s
is_pareto_efficient_simple: Elapsed time is 0.009653s
is_pareto_efficient: Elapsed time is 0.005479s
1,000,000 個樣本,2 個成本:
is_pareto_efficient_dumb: Really, really, slow
is_pareto_efficient_simple: Elapsed time is 1.174s
is_pareto_efficient: Elapsed time is 0.4033s
10,000 個樣本,15 個成本:
is_pareto_efficient_dumb: Elapsed time is 4.019s
is_pareto_efficient_simple: Elapsed time is 6.466s
is_pareto_efficient: Elapsed time is 6.41s
請注意,如果效率是一個問題,您可以通過預先對數據重新排序來進一步提高 2 倍左右的速度,請參閱此處。
這是另一個簡單的實現,它對於適度的尺寸非常快。 假設輸入點是唯一的。
def keep_efficient(pts):
'returns Pareto efficient row subset of pts'
# sort points by decreasing sum of coordinates
pts = pts[pts.sum(1).argsort()[::-1]]
# initialize a boolean mask for undominated points
# to avoid creating copies each iteration
undominated = np.ones(pts.shape[0], dtype=bool)
for i in range(pts.shape[0]):
# process each point in turn
n = pts.shape[0]
if i >= n:
break
# find all points not dominated by i
# since points are sorted by coordinate sum
# i cannot dominate any points in 1,...,i-1
undominated[i+1:n] = (pts[i+1:] >= pts[i]).any(1)
# keep points undominated so far
pts = pts[undominated[:n]]
return pts
我們首先根據坐標總和對點進行排序。 這很有用,因為
x
的坐標和大於點y
,則y
不能支配x
。 以下是與 Peter 的答案相關的一些基准,使用np.random.randn
。
N=10000 d=2
keep_efficient
1.31 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
is_pareto_efficient
6.51 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
N=10000 d=3
keep_efficient
2.3 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
is_pareto_efficient
16.4 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
N=10000 d=4
keep_efficient
4.37 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
is_pareto_efficient
21.1 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
N=10000 d=5
keep_efficient
15.1 ms ± 491 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
is_pareto_efficient
110 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
N=10000 d=6
keep_efficient
40.1 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
is_pareto_efficient
279 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
N=10000 d=15
keep_efficient
3.92 s ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
is_pareto_efficient
5.88 s ± 74.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
我最近最終研究了這個問題,並發現了一個有用的啟發式方法,如果獨立分布的點很多且維度很少,則該方法效果很好。
這個想法是計算點的凸包。 由於維數少且點獨立分布,凸包的頂點數會很少。 直觀地,我們可以預期凸包的一些頂點會支配許多原始點。 此外,如果凸包中的一個點不受凸包中任何其他點的支配,那么它也不受原始集合中任何點的支配。
這給出了一個簡單的迭代算法。 我們反復
我為維度 3 添加了一些基准。似乎對於某些點分布,這種方法會產生更好的漸近性。
import numpy as np
from scipy import spatial
from functools import reduce
# test points
pts = np.random.rand(10_000_000, 3)
def filter_(pts, pt):
"""
Get all points in pts that are not Pareto dominated by the point pt
"""
weakly_worse = (pts <= pt).all(axis=-1)
strictly_worse = (pts < pt).any(axis=-1)
return pts[~(weakly_worse & strictly_worse)]
def get_pareto_undominated_by(pts1, pts2=None):
"""
Return all points in pts1 that are not Pareto dominated
by any points in pts2
"""
if pts2 is None:
pts2 = pts1
return reduce(filter_, pts2, pts1)
def get_pareto_frontier(pts):
"""
Iteratively filter points based on the convex hull heuristic
"""
pareto_groups = []
# loop while there are points remaining
while pts.shape[0]:
# brute force if there are few points:
if pts.shape[0] < 10:
pareto_groups.append(get_pareto_undominated_by(pts))
break
# compute vertices of the convex hull
hull_vertices = spatial.ConvexHull(pts).vertices
# get corresponding points
hull_pts = pts[hull_vertices]
# get points in pts that are not convex hull vertices
nonhull_mask = np.ones(pts.shape[0], dtype=bool)
nonhull_mask[hull_vertices] = False
pts = pts[nonhull_mask]
# get points in the convex hull that are on the Pareto frontier
pareto = get_pareto_undominated_by(hull_pts)
pareto_groups.append(pareto)
# filter remaining points to keep those not dominated by
# Pareto points of the convex hull
pts = get_pareto_undominated_by(pts, pareto)
return np.vstack(pareto_groups)
# --------------------------------------------------------------------------------
# previous solutions
# --------------------------------------------------------------------------------
def is_pareto_efficient_dumb(costs):
"""
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
is_efficient[i] = np.all(np.any(costs>=c, axis=1))
return is_efficient
def is_pareto_efficient(costs):
"""
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
if is_efficient[i]:
is_efficient[is_efficient] = np.any(costs[is_efficient]<=c, axis=1) # Remove dominated points
return is_efficient
def dominates(row, rowCandidate):
return all(r >= rc for r, rc in zip(row, rowCandidate))
def cull(pts, dominates):
dominated = []
cleared = []
remaining = pts
while remaining:
candidate = remaining[0]
new_remaining = []
for other in remaining[1:]:
[new_remaining, dominated][dominates(candidate, other)].append(other)
if not any(dominates(other, candidate) for other in new_remaining):
cleared.append(candidate)
else:
dominated.append(candidate)
remaining = new_remaining
return cleared, dominated
# --------------------------------------------------------------------------------
# benchmarking
# --------------------------------------------------------------------------------
# to accomodate the original non-numpy solution
pts_list = [list(pt) for pt in pts]
import timeit
# print('Old non-numpy solution:s\t{}'.format(
# timeit.timeit('cull(pts_list, dominates)', number=3, globals=globals())))
print('Numpy solution:\t{}'.format(
timeit.timeit('is_pareto_efficient(pts)', number=3, globals=globals())))
print('Convex hull heuristic:\t{}'.format(
timeit.timeit('get_pareto_frontier(pts)', number=3, globals=globals())))
# >>= python temp.py # 1,000 points
# Old non-numpy solution: 0.0316428339574486
# Numpy solution: 0.005961259012110531
# Convex hull heuristic: 0.012369581032544374
# >>= python temp.py # 1,000,000 points
# Old non-numpy solution: 70.67529802105855
# Numpy solution: 5.398462114972062
# Convex hull heuristic: 1.5286884519737214
# >>= python temp.py # 10,000,000 points
# Numpy solution: 98.03680767398328
# Convex hull heuristic: 10.203076395904645
我嘗試通過一些調整來重寫相同的算法。 我認為您的大部分問題都來自inputPoints.remove(row)
。 這需要搜索點列表; 按索引刪除會更有效率。 我還修改了dominates
功能以避免一些多余的比較。 這在更高的維度上可能會很方便。
def dominates(row, rowCandidate):
return all(r >= rc for r, rc in zip(row, rowCandidate))
def cull(pts, dominates):
dominated = []
cleared = []
remaining = pts
while remaining:
candidate = remaining[0]
new_remaining = []
for other in remaining[1:]:
[new_remaining, dominated][dominates(candidate, other)].append(other)
if not any(dominates(other, candidate) for other in new_remaining):
cleared.append(candidate)
else:
dominated.append(candidate)
remaining = new_remaining
return cleared, dominated
彼得,很好的回應。
我只是想為那些想要在最大化和默認最小化之間做出選擇的人進行概括。 這是一個微不足道的修復,但很高興在這里記錄:
def is_pareto(costs, maximise=False):
"""
:param costs: An (n_points, n_costs) array
:maximise: boolean. True for maximising, False for minimising
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
if is_efficient[i]:
if maximise:
is_efficient[is_efficient] = np.any(costs[is_efficient]>=c, axis=1) # Remove dominated points
else:
is_efficient[is_efficient] = np.any(costs[is_efficient]<=c, axis=1) # Remove dominated points
return is_efficient
dominates
的定義是不正確的。 A 支配 B 當且僅當它在所有維度上都優於或等於 B,並且嚴格來說至少在一個維度上更好。
更正上一篇文章中發現的錯誤,這里是新版本的keep_efficient函數。
def keep_efficient(pts):
'returns Pareto efficient row subset of pts'
# sort points by decreasing sum of coordinates
pts = pts[pts.sum(1).argsort()[::-1]]
# initialize a boolean mask for undominated points
# to avoid creating copies each iteration
undominated = np.ones(pts.shape[0], dtype=bool)
for i in range(pts.shape[0]):
# process each point in turn
n = pts.shape[0]
if i >= n:
break
# find all points not dominated by i
# since points are sorted by coordinate sum
# i cannot dominate any points in 1,...,i-1
undominated[i+1:n] = (pts[i+1:] >= pts[i]).any(1)
# keep points undominated so far
pts = pts[undominated[:n]]
undominated = np.array([True]*len(pts))
return pts
(請注意,上一篇文章中的錯誤是函數 keep_efficient(pts) 返回了錯誤的 Pareto 前沿,輸入為:pts = [[5,5],[4,3], [0,6]]。在編輯之前, 結果是 [5,5] 而預期結果是 [[5 5], [0 6]]。解決方法是添加 for 循環的最后一行: undowned = np.array([True]*len (分)))
我在這里可能有點晚了,但我嘗試了建議的解決方案,似乎他們未能返回所有帕累托點。 我做了一個遞歸實現(明顯更快)來找到帕累托前沿,你可以在https://github.com/Ragheb2464/preto-front找到它
只是為了清楚上面的例子,獲得帕累托前沿的函數與上面的代碼略有不同,應該只包含一個 < 而不是 <= 看起來像這樣:
def is_pareto(costs):
is_efficient = np.ones(costs.shape[0], dtype=bool)
for i, c in enumerate(is_efficient):
if is_efficient[i]:
is_efficient[is_efficient] = np.any(costs[is_efficient]<c, axis=1)
return is_efficient
免責聲明:這只是部分正確,因為統治本身被定義為 <= 對所有人來說,只有 < 對至少一個人來說。 但在大多數情況下應該足夠了
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.