[英]Using sklearn's 'predict' function
If I trained a model in sklearn
using dummy variables for categorical values, what is best practice for feeding a single row of features into this model to get the prediction result? 如果我使用伪变量作为分类值在sklearn
训练了一个模型, sklearn
将sklearn
馈入该模型以获得预测结果的最佳实践是什么? For all input data set I am trying to get scores. 对于所有输入数据集,我试图获得分数。 If I have less columns than the data set I used to train/fit the model, will it throw an error.? 如果我的列数少于用于训练/拟合模型的数据集,它将引发错误。
Just to clarify: I took a data set that had 5 columns and created over 118 dummy columns before I built my model. 需要澄清的是:在建立模型之前,我采用了一个包含5列的数据集,并创建了118个以上的虚拟列。 Now I have a single row of data with 5 columns that I would like to use in predict
function. 现在,我想在predict
函数中使用一行5列的数据。 How can I do this? 我怎样才能做到这一点?
Any help here would be greatly appreciated. 在这里的任何帮助将不胜感激。
It's an error to extend features depending on table state, cause you can't repeat it with another data. 根据表状态扩展功能是一个错误,因为您无法将其与其他数据重复。 If you want to create features this way, you should use a constructor that will remember structure of features. 如果要以这种方式创建要素,则应使用能够记住要素结构的构造函数。 Since you gave no example of data, here is the main idea how you can make a constructor: 由于您没有提供数据示例,因此这里是如何构造构造函数的主要思想:
import pandas as pd
data = pd.DataFrame([['Missouri', 'center', 'Jan', 55, 11],
['Kansas', 'center', 'Mar', 54, 31],
['Georgia', 'east', 'Jan', 37, 18]],
columns=('state', 'pos', 'month', 'High Temp', 'Low Temp'))
test = pd.DataFrame([['Missouri', 'center', 'Feb', 44, 23],
['Missouri', 'center', 'Mar', 55, 33]],
columns=('state', 'pos', 'month', 'High Temp', 'Low Temp'))
class DummyColumns():
def __init__(self, data):
# Columns constructor
self.empty = pd.DataFrame(columns=(list(data.columns) +
list(data.state.unique()) +
list(data.pos.unique()) +
['Winter', 'Not winter']))
def __call__(self, data):
# Initializing with zeros
self.df = pd.DataFrame(data=0, columns=self.empty.columns, index=data.index)
for row in data.itertuples():
self.df.loc[row.Index, :5] = row[1:]
self.df.loc[row.Index, row.state] = 1
self.df.loc[row.Index, row.pos] = 1
if row.month in ['Dec', 'Jan', 'Feb']:
self.df.loc[row.Index, 'Winter'] = 1
else:
self.df.loc[row.Index, 'Not winter'] = 1
return self.df
add_dummy = DummyColumns(data)
dummy_test = add_dummy(test)
print dummy_test
state pos month High Temp Low Temp Missouri Kansas Georgia \
0 Missouri center Feb 44 23 1 0 0
1 Missouri center Mar 55 33 1 0 0
center east Winter Not winter
0 1 0 1 0
1 1 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.