简体   繁体   English

如何使用 SKlearn 预测单个值?

[英]How to predict an individual value using SKlearn?

I am very new to Machine Learning and I would like to get a percentage returned for an individual array that I pass in the prediction model I have created.我对机器学习非常陌生,我想为我在创建的预测模型中传递的单个数组返回一个百分比。

I'm not sure how to go about getting the match percentage.我不确定如何获得匹配百分比。 I thought it was metrics.accuracy_score(Ytest, y_pred) but when I try that it gives me the following error:我以为是metrics.accuracy_score(Ytest, y_pred)但是当我尝试它时它给了我以下错误:
**ValueError: Found input variables with inconsistent numbers of samples: [4, 1]**

I have no idea if this is the correct way to go about this.我不知道这是否是正确的方法。

import numpy as np                  #linear algebra
import pandas as pd                 # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt     #For Visualisation
import seaborn as sns               #For better Visualisation
from bs4 import BeautifulSoup       #For Text Parsing
import mysql.connector
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import joblib
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import GaussianNB
import docx2txt
import re
import csv
from sklearn import metrics

class Machine:

    TrainData       = ''


    def __init__(self):


        self.TrainData          = self.GetTrain()

        Data                    = self.ProcessData()

        x                       = Data[0]
        y                       = Data[1]

        x, x_test, y, y_test    = train_test_split(x,y, stratify = y, test_size = 0.25, random_state = 42)

        self.Predict(x,y, '',x_test , y_test )

    def Predict(self,X,Y,Data, Xtext, Ytest):

        model = GaussianNB()
        model.fit(Xtext, Ytest)

        y_pred = model.predict([[1.0, 2.00613, 2, 5]])

        print("Accuracy:",metrics.accuracy_score(Ytest, y_pred))
        


    def ProcessData(self):

            X = []
            Y = []
            i = 0
            for I in self.TrainData:

                Y.append(I[4])
                X.append(I)

                i = i + 1

            i = 0
            for j in X:

                X[i][0] = float(X[i][0])
                X[i][1] = float(X[i][1])
                X[i][2] = int(X[i][2])
                X[i][3] = int(X[i][3])
                del X[i][4]

                i = i + 1

            return X,Y


    def GetTrain(self):
        file        = open('docs/training/TI_Training.csv')
        csvreader   = csv.reader(file)

        header      = []
        header      = next(csvreader)

        rows        = []

        for row in csvreader:
            rows.append(row)

        file.close()

        return rows



Machine()

The error is pretty clear: YTest has 4 samples, and y_pred only has one.错误很明显: YTest有 4 个样本,而y_pred只有一个。 You need an equal number of samples in each to get any metrics.您需要每个样本中相同数量的样本才能获得任何指标。 I suspect you instead want to do我怀疑你反而想做

y_pred = model.predict(Xtext)

in your Predict function.在您的Predict功能中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM