简体   繁体   English

Python SKLearn:预测序列时出现“输入形状错误”错误

[英]Python SKLearn: 'Bad input shape' error when predicting a sequence

I have an Excel file that stores a sequence in each column (reading from top cell to bottom cell), and the trend of the sequence is similar to the previous column. 我有一个Excel文件,该文件在每个列中存储一个序列(从顶部单元格到底部单元格读取),并且该序列的趋势类似于上一列。 So I'd like to predict the sequence for the nth column in this dataset. 因此,我想预测此数据集中第n列的顺序。

A sample of my data set: 我的数据集样本:

样本数据

See that each column has a set of values / sequence, and they sort of progress as we move to the right, so I want to predict eg the values in the Z column. 看到每个列都有一组值/序列,并且随着我们向右移动它们会有所进展,因此我想预测例如Z列中的值。

Here's my code so far: 到目前为止,这是我的代码:

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Read the Excel file in rows
df = pd.read_excel(open('vec_sol2.xlsx', 'rb'),
                header=None, sheet_name='Sheet1')
print(type(df))
length = len(df.columns)
# Get the sequence for each row

x_train, x_test, y_train, y_test = train_test_split(
    np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

print("y_train shape: ", y_train.shape)

pred_model = LogisticRegression()
pred_model.fit(x_train, y_train)
print(pred_model)

I'll explain the logic as much as possible: 我将尽可能解释逻辑:

  • x_train and x_test will just be the index / column number that is associated with a sequence. x_trainx_test将只是与序列关联的索引/列号。
  • y_train is an array of sequences. y_train是序列的数组。
  • There is a total of 51 columns, so splitting it with 25% being test data results in 37 train sequences and 13 test sequences. 总共有51列,因此将其拆分为25%的测试数据可得到37个训练序列和13个测试序列。

I've managed to get the shapes of each var when debugging, they are: 我设法在调试时获取每个var的形状,它们是:

  • x_train : (37, 1) x_train :( x_train
  • x_test : (13, 1) x_test :( x_test
  • y_train : (37, 51) y_train :( y_train
  • y_test : (13, 51) y_test :( y_test

But right now, running the program gives me this error: 但是现在,运行程序给我这个错误:

ValueError: bad input shape (37, 51)

What is my mistake here? 我这是什么错

I don't understand why are you using this: 我不明白您为什么使用这个:

x_train, x_test, y_train, y_test = train_test_split(
np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

You have data here in df . 您在df有数据。 Extract X and y from it and then split it to train and test. 从中提取Xy ,然后将其拆分以进行训练和测试。

Try this: 尝试这个:

X = df.iloc[:,:-1]
y = df.iloc[:, -1:]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

Otherwise, the stats you shared shows you are trying to have 51 columned output from one feature, which is weird if you think about it. 否则,您共享的统计信息表明您正试图从一项功能中获得51列输出,如果考虑一下,这是很奇怪的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM