简体   繁体   English

如何使用Sklearn的管道进行参数调整/交叉验证?

[英]How do to parameter tuning/cross-validation with Sklearn's pipeline?

I have just discovered Sklearn's pipeline feature which I think will be useful for sentiment analysis. 我刚刚发现了Sklearn的管道功能,我认为这对情感分析很有用。 I have defined my pipeline in the following way: 我已经通过以下方式定义了管道:

Pipeline([('vect', CountVectorizer(tokenizer=LemmaTokenizer(),
                         stop_words='english',
                         strip_accents='unicode',
                         max_df=0.5)),
          ('clf', MultinomialNB())

However, by defining it in the way above, I am not allowing for parameter tuning. 但是,通过上面的方式定义它,我不允许参数调整。 Let's say I want to look at the following max_dfs=[0,3,0.4,0.5,0.6,0.7] and also the following n_gram ranges = [(1,1), (1,2), (2,2), and use cross validation to find the best combination. 假设我想看看以下max_dfs = [0,3,0.4,0.5,0.6,0.7],还有以下n_gram范围= [(1,1),(1,2),(2,2),并使用交叉验证找到最佳组合。 Is there a way to specify this in our outside the pipeline so it knows to consider all possible combinations? 有没有一种方法可以在我们的管道外部指定它,以便知道考虑所有可能的组合? If so, how would this be done? 如果是这样,将如何进行?

Thank you so much for your guidance and help! 非常感谢您的指导和帮助!

you can set the parameter for individual steps in pipeline by using the set_param function, and passing the key_name as <stepname>__<paramname> (joined using double underscore). 您可以通过使用set_param函数,并将key_name传递为<stepname>__<paramname> (使用双下划线连接),来为管道中的各个步骤设置参数。

This can be combined with GridSearchCV to identify the combination of parameters which maximize the score function from the give values 可以将其与GridSearchCV结合使用,以识别参数组合,从而根据给定值最大化得分函数

p = Pipeline([('vect', CountVectorizer(tokenizer=LemmaTokenizer(),
                         stop_words='english',
                         strip_accents='unicode',
                         max_df=0.5)),
          ('clf', MultinomialNB())
g = GridSearchCV(p, 
        param_grid={
              'vect__max_dfs':[0,3,0.4,0.5,0.6,0.7], 'vect__ngram_range':  [(1,1), (1,2), (2,2)]})
g.fit(X, y)
g.best_estimator_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM