如何将熊猫操作集成到sklearn管道中

Question

I have a simple operation on pandas dataframe like this: 我对熊猫数据框有一个简单的操作，如下所示：

# initialization
dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})
# actual transformation
df['newid'] = df.id.map(dct)

And I would like to put this transformation as a part of a sklearn pipeline. 我想将此转换作为sklearn管道的一部分。 I found a few tutorials here , here , and here . 我在这里，这里和这里找到了一些教程。 But I just can't get it work for me. 但是我只是无法让它对我有用。 Here's one version of many versions I have tried: 这是我尝试过的许多版本的一个版本：

# initialization
dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})

# define a class similar to those in the tutorials
class idMapper(BaseEstimator, TransformerMixin):
    def __init__(self, key='id'):
        self.key = key

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X[key].map(dct)

# Apply the transformation
idMapper.fit_transform(df)

The error message is like this: TypeError: fit_transform() missing 1 required positional argument: 'X' . 错误消息是这样的： TypeError: fit_transform() missing 1 required positional argument: 'X' 。 Can anyone help me fix this issue and get it working? 谁能帮助我解决此问题并使它正常工作？ Thanks! 谢谢！

Answer 1

See below a corrected version of your code. 参见下面的代码更正版本。 Explanation given in the comments. 注释中给出了解释。

dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})

# define a class similar to those in the tutorials
class idMapper(BaseEstimator, TransformerMixin):
    def __init__(self, key='id'):
        self.key = key

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X[self.key].map(dct)  # <--- self.key

# Apply the transformation
idMapper().fit_transform(df)  # <--- need to instantiate

如何将熊猫操作集成到sklearn管道中

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-07-03 23:35:10

如何将熊猫操作集成到sklearn管道中

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-07-03 23:35:10

解决方案1
3 已采纳 2018-07-03 23:35:10