使用sklearn的“预测”功能

Question

If I trained a model in sklearn using dummy variables for categorical values, what is best practice for feeding a single row of features into this model to get the prediction result? 如果我使用伪变量作为分类值在sklearn训练了一个模型， sklearn将sklearn馈入该模型以获得预测结果的最佳实践是什么？ For all input data set I am trying to get scores. 对于所有输入数据集，我试图获得分数。 If I have less columns than the data set I used to train/fit the model, will it throw an error.? 如果我的列数少于用于训练/拟合模型的数据集，它将引发错误。

Just to clarify: I took a data set that had 5 columns and created over 118 dummy columns before I built my model. 需要澄清的是：在建立模型之前，我采用了一个包含5列的数据集，并创建了118个以上的虚拟列。 Now I have a single row of data with 5 columns that I would like to use in predict function. 现在，我想在predict函数中使用一行5列的数据。 How can I do this? 我怎样才能做到这一点？

Any help here would be greatly appreciated. 在这里的任何帮助将不胜感激。

Answer 1

It's an error to extend features depending on table state, cause you can't repeat it with another data. 根据表状态扩展功能是一个错误，因为您无法将其与其他数据重复。 If you want to create features this way, you should use a constructor that will remember structure of features. 如果要以这种方式创建要素，则应使用能够记住要素结构的构造函数。 Since you gave no example of data, here is the main idea how you can make a constructor: 由于您没有提供数据示例，因此这里是如何构造构造函数的主要思想：

import pandas as pd

data = pd.DataFrame([['Missouri', 'center', 'Jan', 55, 11],
                     ['Kansas', 'center', 'Mar', 54, 31],
                     ['Georgia', 'east', 'Jan', 37, 18]],
                     columns=('state', 'pos', 'month', 'High Temp', 'Low Temp'))


test =  pd.DataFrame([['Missouri', 'center', 'Feb', 44, 23], 
                      ['Missouri', 'center', 'Mar', 55, 33]],
                      columns=('state', 'pos', 'month', 'High Temp', 'Low Temp'))  


class DummyColumns():
    def __init__(self, data):
        # Columns constructor
        self.empty = pd.DataFrame(columns=(list(data.columns) +
                                           list(data.state.unique()) +
                                           list(data.pos.unique()) +
                                           ['Winter', 'Not winter']))
    def __call__(self, data):
        # Initializing with zeros
        self.df = pd.DataFrame(data=0, columns=self.empty.columns, index=data.index)        
        for row in data.itertuples():
            self.df.loc[row.Index, :5] = row[1:]
            self.df.loc[row.Index, row.state] = 1
            self.df.loc[row.Index, row.pos] = 1
            if row.month in ['Dec', 'Jan', 'Feb']:
                self.df.loc[row.Index, 'Winter'] = 1
            else:
                self.df.loc[row.Index, 'Not winter'] = 1
        return self.df       

add_dummy = DummyColumns(data)
dummy_test = add_dummy(test)
print dummy_test

      state     pos month  High Temp  Low Temp  Missouri  Kansas  Georgia  \
0  Missouri  center   Feb         44        23         1       0        0   
1  Missouri  center   Mar         55        33         1       0        0   

   center  east  Winter  Not winter  
0       1     0       1           0  
1       1     0       0           1

使用sklearn的“预测”功能

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-06-01 12:01:34

使用sklearn的“预测”功能

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-06-01 12:01:34

解决方案1
0 已采纳 2016-06-01 12:01:34