简体   繁体   English

使用 scikit-learn 预处理器选择 pandas 数据框中的行子集

[英]using scikit-learn preprocesser to select subset of rows in pandas dataframe

Is there a scikit-learn preprocesser I can use or implement to select a subset of rows from a pandas dataframe?是否有我可以使用或实现的 scikit-learn 预处理器来从 pandas 数据框中选择行的子集? I would prefer a preprocesser to do this since I want to build a pipeline with this as a step.我更喜欢预处理器来执行此操作,因为我想以此为步骤构建管道。

You can use a FunctionTransformer to do that.您可以使用FunctionTransformer来做到这一点。 To a FunctionTransformer, you can pass any Callable that exposes the same interface as standard scikitlearn transform calls have.对于 FunctionTransformer,您可以传递任何与标准 scikitlearn 转换调用具有相同接口的 Callable。 In code在代码中

import pandas as pd
from sklearn.preprocessing import FunctionTransformer

class RowSelector:
    def __init__(self, rows:list[int]):
        self._rows = rows

    def __call__(self, X:pd.DataFrame, y=None) -> pd.DataFrame:
        return X.iloc[self._rows,:]

selector = FunctionTransformer(RowSelector(rows=[1,3]))
df = pd.DataFrame({'a':range(4), 'b':range(4), 'c':range(4)})
selector.fit_transform(df)
#Returns
   a  b  c
1  1  1  1
3  3  3  3

Not that, I have used a callable object to track some state, ie the rows to be selected.并非如此,我使用了一个可调用对象来跟踪某些状态,即要选择的行。 This is not necessary and could be solved differently.这不是必需的,可以通过不同的方式解决。

The cool thing is that it returns a data frame, so if you have it as the first step of your pipeline, you can also combine it with a subsequent column transformer (if needed of course)很酷的是它返回一个数据框,所以如果您将它作为管道的第一步,您还可以将它与后续的列转换器结合起来(当然如果需要)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM