简体   繁体   English

Python 类方法的运行速度比相同的函数慢得多

[英]Python class method runs much slower than identical function

I have the following class:我有以下课程:

class MyKMeans:
    def __init__(self, max_iter = 300):
        self.max_iter = max_iter
        # Directly access 
        self.centroids = None
        self.clusters = None

    def fit(self, X, k):
        """
        
        """

        # each point is assigned to a cluster
        clusters = np.zeros(X.shape[0])

        # select k random centroids
        random_idxs = np.random.choice(len(X), size=k, replace=False)
        centroids = X[random_idxs, :]

        # iterate until no change occurs in centroids
        while True: 
            # for each point
            for i, point in enumerate(X):
                min_d = float('inf')

                # find the closest centroid to the point
                for idx, centroid in enumerate(centroids):
                    d = euclidean_dist(centroid, point)
                    if d < min_d:
                        min_d = d
                        clusters[i] = idx

                # update the new centroids by averaging the points in each cluster
                new_centroids = pd.DataFrame(X).groupby(by=clusters).mean().values
            
            # if the centroids didn't change, then stop
            if np.count_nonzero(centroids-new_centroids) == 0:
                break
            # otherwise, update the centroids
            else:
                centroids = new_centroids

        self.centroids = centroids
        self.clusters = clusters

and run it using并使用运行它

k = 4
kmeans = MyKMeans()
kmeans.fit(X, k)
centroids, clusters = kmeans.centroids, kmeans.clusters

However, this takes usually 5 seconds to complete running.但是,这通常需要 5 秒钟才能完成运行。 On the other hand, if I move the method to a new function,另一方面,如果我将方法移动到一个新函数,

def fit(X, k):
    """
    
    """

    # each point is assigned to a cluster
    clusters = np.zeros(X.shape[0])

    # select k random centroids
    random_idxs = np.random.choice(len(X), size=k, replace=False)
    centroids = X[random_idxs, :]

    # iterate until no change occurs in centroids
    while True: 
        # for each point
        for i, point in enumerate(X):
            min_d = float('inf')

            # find the closest centroid to the point
            for idx, centroid in enumerate(centroids):
                d = euclidean_dist(centroid, point)
                if d < min_d:
                    min_d = d
                    clusters[i] = idx

            # update the new centroids by averaging the points in each cluster
            new_centroids = pd.DataFrame(X).groupby(by=clusters).mean().values
        
        # if the centroids didn't change, then stop
        if np.count_nonzero(centroids-new_centroids) == 0:
            break
        # otherwise, update the centroids
        else:
            centroids = new_centroids

        return centroids, clusters

and get the same variables by calling centroids, clusters = fit(X, k) , the runtime is around 0.5-1 second which is a big difference.并通过调用centroids, clusters = fit(X, k)获得相同的变量,运行时间约为 0.5-1 秒,这是一个很大的差异。

Is there a reason why simply having a class method instead of a function causes such a big difference in runtime, and is there any way to improve the runtime while still being able to use the class?是否有理由仅仅使用类方法而不是函数会导致运行时产生如此大的差异,并且有什么方法可以在仍然能够使用类的同时改进运行时?

非类版本中的 return 语句位于 while 循环内,因此它会提前退出循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM