sklearn中的自定义变压器

Question

I am building a transformer in sklearn which drops features that have a correlation coefficient lower than a specified threshold. 我正在sklearn中构建一个变压器，该变压器会丢弃相关系数低于指定阈值的特征。

It works on the training set. 它适用于训练集。 However, when I transform the test set. 但是，当我转换测试集时。 All features on the test set disappear. 测试仪上的所有功能均消失。 I assume the transformer is calculating correlations between test data and training label and since those are all low, it is dropping all features. 我假设变压器正在计算测试数据和训练标签之间的相关性，并且由于这些相关性都很低，因此它将删除所有功能。 How do I make it only calculate correlations on the training set and drop those features from the test set on the transform? 如何使其仅在训练集上计算相关性，并从变换中的测试集中删除那些特征？

class CorrelatedFeatures(BaseEstimator, TransformerMixin): #Selects only features that have a correlation coefficient higher than threshold with the response label
    def __init__(self, response, threshold=0.1):
        self.threshold = threshold
        self.response = response
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        df = pd.concat([X, self.response], axis=1)
        cols = df.columns[abs(df.corr()[df.columns[-1]]) > self.threshold].drop(self.response.columns)
        return X[cols]

Answer 1

You calculate and store that correlation and the columns to be dropped in fit() , and in transform() just transform those columns. 您可以计算并存储该相关性，并将要删除的列存储在fit() ，而在transform()只需转换这些列即可。

Something like this: 像这样：

....
....

def fit(self, X, y=None):
    df = pd.concat([X, self.response], axis=1)
    self.cols = df.columns[abs(df.corr()[df.columns[-1]]) > self.threshold].drop(self.response.columns)
    return self
def transform(self, X, y=None):
    return X[self.cols]

sklearn中的自定义变压器

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-15 06:35:12

sklearn中的自定义变压器

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-15 06:35:12

解决方案1
1 已采纳 2019-02-15 06:35:12