[英]Python class method runs much slower than identical function
I have the following class:我有以下课程:
class MyKMeans:
def __init__(self, max_iter = 300):
self.max_iter = max_iter
# Directly access
self.centroids = None
self.clusters = None
def fit(self, X, k):
"""
"""
# each point is assigned to a cluster
clusters = np.zeros(X.shape[0])
# select k random centroids
random_idxs = np.random.choice(len(X), size=k, replace=False)
centroids = X[random_idxs, :]
# iterate until no change occurs in centroids
while True:
# for each point
for i, point in enumerate(X):
min_d = float('inf')
# find the closest centroid to the point
for idx, centroid in enumerate(centroids):
d = euclidean_dist(centroid, point)
if d < min_d:
min_d = d
clusters[i] = idx
# update the new centroids by averaging the points in each cluster
new_centroids = pd.DataFrame(X).groupby(by=clusters).mean().values
# if the centroids didn't change, then stop
if np.count_nonzero(centroids-new_centroids) == 0:
break
# otherwise, update the centroids
else:
centroids = new_centroids
self.centroids = centroids
self.clusters = clusters
and run it using并使用运行它
k = 4
kmeans = MyKMeans()
kmeans.fit(X, k)
centroids, clusters = kmeans.centroids, kmeans.clusters
However, this takes usually 5 seconds to complete running.但是,这通常需要 5 秒钟才能完成运行。 On the other hand, if I move the method to a new function,
另一方面,如果我将方法移动到一个新函数,
def fit(X, k):
"""
"""
# each point is assigned to a cluster
clusters = np.zeros(X.shape[0])
# select k random centroids
random_idxs = np.random.choice(len(X), size=k, replace=False)
centroids = X[random_idxs, :]
# iterate until no change occurs in centroids
while True:
# for each point
for i, point in enumerate(X):
min_d = float('inf')
# find the closest centroid to the point
for idx, centroid in enumerate(centroids):
d = euclidean_dist(centroid, point)
if d < min_d:
min_d = d
clusters[i] = idx
# update the new centroids by averaging the points in each cluster
new_centroids = pd.DataFrame(X).groupby(by=clusters).mean().values
# if the centroids didn't change, then stop
if np.count_nonzero(centroids-new_centroids) == 0:
break
# otherwise, update the centroids
else:
centroids = new_centroids
return centroids, clusters
and get the same variables by calling centroids, clusters = fit(X, k)
, the runtime is around 0.5-1 second which is a big difference.并通过调用
centroids, clusters = fit(X, k)
获得相同的变量,运行时间约为 0.5-1 秒,这是一个很大的差异。
Is there a reason why simply having a class method instead of a function causes such a big difference in runtime, and is there any way to improve the runtime while still being able to use the class?是否有理由仅仅使用类方法而不是函数会导致运行时产生如此大的差异,并且有什么方法可以在仍然能够使用类的同时改进运行时?
非类版本中的 return 语句位于 while 循环内,因此它会提前退出循环。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.