简体   繁体   中英

Explainantion of a function used in class for feature selection

I came across a function which goes as follows:

def indices_of_top_k(arr, k):
    return np.sort(np.argpartition(np.array(arr), -k)[-k:])

I am not able to understand what it does or how each of its components work. Could someone please give an explanation for what it does?

For context it is used in the class given below for feature selection:

class TopFeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, feature_importances, k):
        self.feature_importances = feature_importances
        self.k = k
    def fit(self, X, y=None):
        self.feature_indices_ = indices_of_top_k(self.feature_importances, self.k)
        return self
    def transform(self, X):
        return X[:, self.feature_indices_]

Thanks,

partition can be harder to understand than sort. Think of it as an incomplete sort.

In [152]: x=np.random.randint(0,50,12)
In [153]: x
Out[153]: array([16, 16,  4, 33, 39, 43, 28, 47,  2, 23, 25, 11])

To get the largest 5 elements, we can sort, and slice:

In [154]: np.sort(x)[-5:]
Out[154]: array([28, 33, 39, 43, 47])

partition gets the same values, but the order is a bit different:

In [155]: np.partition(x,-5)[-5:]
Out[155]: array([28, 33, 39, 47, 43])

The corresponding indices:

In [156]: np.argpartition(x,-5)[-5:]
Out[156]: array([6, 3, 4, 7, 5])

sorting those indices:

In [157]: np.sort(np.argpartition(x,-5)[-5:])
Out[157]: array([3, 4, 5, 6, 7])

Using argsort instead does the same thing, but supposedly argpartition is faster than argsort :

In [158]: np.sort(np.argsort(x)[-5:])
Out[158]: array([3, 4, 5, 6, 7])

From this we can get the 5 largest values, but in their original order, as opposed to the sorted order in [154]:

In [159]: x[_]
Out[159]: array([33, 39, 43, 28, 47])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM