简体   繁体   English

尝试针对我的随机 forrest 模型进行测试时,如何修复“特征数量错误”?

[英]How do I fix a “Number of features error” when trying to test against my random forrest model?

I have a trained model.我有一个训练有素的模型。 I want to find out which class a new data belongs to.我想找出新数据属于哪个类。 I've done some trials, but I've encountered some problems.我做了一些试验,但我遇到了一些问题。

with open('text_classifier', 'rb') as training_model:
model = pickle.load(training_model)
y_pred2 = model.predict(X_test)

This code works此代码有效

But...但...

 new_test_data=["spor toto süper lig 30. hafta medipol bu akşam ev göztepe 
ile saat 20.30'da başla mücadele suat arslanboğa arslanboğa yardımcı 
serka ok ve ismail şencan"]
tfidfconverter = TfidfVectorizer()
new_test_data = tfidfconverter.fit_transform(new_test_data).toarray()
model.predict(new_test_data)

I get an error like this我收到这样的错误

Number of features of the model must match the input. Model n_features is 9671 and input n_features is 25

The code block I'm training with我正在训练的代码块

data = load_files(r"...\docs",encoding="utf-8")
X, y = data.data, data.target
tfidfconverter = TfidfVectorizer(min_df=3, max_df=0.7)
X = tfidfconverter.fit_transform(X).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
random_state=0)
classifier = RandomForestClassifier(n_estimators=1000, random_state=0)
classifier.fit(X_train, y_train)
y_pred2 = classifier.predict(X_test)

I believe that you will need to specify the parameters in your data that you want to actually use as features when training the model.我相信在训练模型时,您需要在数据中指定要实际用作特征的参数。 It looks like your training model is using the row entries as the features instead of each column.看起来您的训练模型使用行条目作为特征而不是每一列。 This can be fixed by reading in the data and then turning around and converting it to CSV and then reading it in again.这可以通过读入数据,然后将其转换为 CSV,然后再次读入来解决。 However, this step should be unnecessary if you already know how the data is structured.但是,如果您已经知道数据的结构,则不需要此步骤。 Basically, you just need to know the names of the columns of the data.基本上,您只需要知道数据列的名称。 You will need the Pandas module for this method.此方法需要 Pandas 模块。 Here is some code...这是一些代码...

    data = load_files(r"...\docs",encoding="utf-8")
    data.to_csv('train_data.csv', encoding = 'utf-8', index = False)

then read the training data back in from the CSV...然后从 CSV 读回训练数据......

    train_data = pd.read_csv('train_data.csv')

Now when you call the train_test_split method you should specify what you want to use as the features in the data.现在,当您调用 train_test_split 方法时,您应该指定要用作数据中的特征的内容。 This is generally the columns in a data table as these are the metrics being collected to analyze.这通常是数据表中的列,因为这些是收集来分析的指标。 I define functions to split the data and build the model specifying the features because I think it is easier to understand but you can also just call the functions directly.我定义了函数来拆分数据并构建指定特征的模型,因为我认为它更容易理解,但您也可以直接调用函数。

    def split_dataset(dataset, train_percentage, feature_headers, target_header):
        train_x, test_x, train_y, test_y = train_test_split(dataset[feature_headers], 
        dataset[target_header], train_size = train_percentage)
        return train_x, test_x, train_y, test_y

    def random_forest_classifier(features, target):
        model = RandomForestClassifier(n_estimators = 500, oob_score = True, n_jobs 
        =-1,random_state = 1, min_impurity_decrease = .01)
        model.fit(features, target)
        return model

Now you are ready to call the functions using your data.现在您已准备好使用您的数据调用函数。

    train_x, test_x, train_y, test_y = split_dataset(train_data, 0.80, 
    train_data.columns[0:24], train_data.columns[-1])

    trained_model = random_forest_classifier(train_x,train_y)

You should now be able to predict against your trained model using the 25 features.您现在应该能够使用 25 个特征针对您的训练模型进行预测。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试训练 TensorFlow 模型时如何修复 InvalidArgument 错误? - How do I fix an InvalidArgument error when trying to train a TensorFlow model? 如何修复基于随机数生成器不重复的 if 语句 - How do I fix my if statements not repeating based on a random number generator 编码新机器人时如何解决此错误? - How do I fix this error when coding my new bot? 如何修复 sklearn ValueError:model 的特征数量必须与输入匹配 - How to fix sklearn ValueError: Number of features of the model must match the input 在 Python 上没有询问任何问题时,如何阻止我的 Magic 8 Ball 程序生成随机数? - How do I stop my Magic 8 Ball Program from generating a random number when no question is asked on Python? 我如何 select 我在训练数据中选择的测试数据中的相同特征? - How do I select the same features in my test data that I selected in my train data? 如何修复运行此代码时出现的(TypeError:必须是实数,而不是元组)错误? - How do I fix (TypeError: must be a real number, not a tuple) error that I get when I run this code? 尝试将元素附加到列表python时如何修复错误Native JS不支持索引 - How do I fix the error Native JS does not support indexing when trying to append an element into a list python 如何在 pandas 中修复此错误,我需要在其中找到具有最高某些功能的元素? - How do I fix this error in pandas where I need to find the element with the highest of some features? 如何针对经过keras训练的模型运行测试数据? - How can I run test data against my keras trained model?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM