简体   繁体   English

sklearn中的自定义变压器

[英]Custom Transformer in sklearn

I am building a transformer in sklearn which drops features that have a correlation coefficient lower than a specified threshold. 我正在sklearn中构建一个变压器,该变压器会丢弃相关系数低于指定阈值的特征。

It works on the training set. 它适用于训练集。 However, when I transform the test set. 但是,当我转换测试集时。 All features on the test set disappear. 测试仪上的所有功能均消失。 I assume the transformer is calculating correlations between test data and training label and since those are all low, it is dropping all features. 我假设变压器正在计算测试数据和训练标签之间的相关性,并且由于这些相关性都很低,因此它将删除所有功能。 How do I make it only calculate correlations on the training set and drop those features from the test set on the transform? 如何使其仅在训练集上计算相关性,并从变换中的测试集中删除那些特征?

class CorrelatedFeatures(BaseEstimator, TransformerMixin): #Selects only features that have a correlation coefficient higher than threshold with the response label
    def __init__(self, response, threshold=0.1):
        self.threshold = threshold
        self.response = response
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        df = pd.concat([X, self.response], axis=1)
        cols = df.columns[abs(df.corr()[df.columns[-1]]) > self.threshold].drop(self.response.columns)
        return X[cols]

You calculate and store that correlation and the columns to be dropped in fit() , and in transform() just transform those columns. 您可以计算并存储该相关性,并将要删除的列存储在fit() ,而在transform()只需转换这些列即可。

Something like this: 像这样:

....
....

def fit(self, X, y=None):
    df = pd.concat([X, self.response], axis=1)
    self.cols = df.columns[abs(df.corr()[df.columns[-1]]) > self.threshold].drop(self.response.columns)
    return self
def transform(self, X, y=None):
    return X[self.cols]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在sklearn中保存自定义变换器? - How to save a custom transformer in sklearn? 如何在sklearn中修复此自定义转换器? - How to fix this custom transformer in sklearn? 使用自定义变压器部署 sklearn model - Deploy sklearn model with custom transformer 如何在自定义变压器中访问 sklearn 列变压器 - how to access sklearn column transformer within a custom transformer 如何为时间序列制作自定义 sklearn 转换器? - How to make a custom sklearn transformer for time series? mlflow 如何使用自定义转换器保存 sklearn 管道? - mlflow How to save a sklearn pipeline with custom transformer? 自定义Sklearn Transformer单独工作,在管道中使用时引发错误 - Custom Sklearn Transformer works alone, Throws Error When Used in Pipeline Sklearn Pipeline - 如何在自定义Transformer(而不是Estimator)中继承get_params - Sklearn Pipeline - How to inherit get_params in custom Transformer (not Estimator) 在sklearn中编写自定义转换器,该转换器在.transform中返回估算器的.predict - Write custom transformer in sklearn which returns .predict of estimator in .transform 在 sklearn 中创建自定义变压器时出错 - 需要 2 个位置 arguments 但给出了 3 个 - Error creating a custom transformer in sklearn - takes 2 positional arguments but 3 were given
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM