[英]Python SKLearn: 'Bad input shape' error when predicting a sequence
I have an Excel file that stores a sequence in each column (reading from top cell to bottom cell), and the trend of the sequence is similar to the previous column. 我有一个Excel文件,该文件在每个列中存储一个序列(从顶部单元格到底部单元格读取),并且该序列的趋势类似于上一列。 So I'd like to predict the sequence for the nth column in this dataset. 因此,我想预测此数据集中第n列的顺序。
A sample of my data set: 我的数据集样本:
See that each column has a set of values / sequence, and they sort of progress as we move to the right, so I want to predict eg the values in the Z column. 看到每个列都有一组值/序列,并且随着我们向右移动它们会有所进展,因此我想预测例如Z列中的值。
Here's my code so far: 到目前为止,这是我的代码:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Read the Excel file in rows
df = pd.read_excel(open('vec_sol2.xlsx', 'rb'),
header=None, sheet_name='Sheet1')
print(type(df))
length = len(df.columns)
# Get the sequence for each row
x_train, x_test, y_train, y_test = train_test_split(
np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)
print("y_train shape: ", y_train.shape)
pred_model = LogisticRegression()
pred_model.fit(x_train, y_train)
print(pred_model)
I'll explain the logic as much as possible: 我将尽可能解释逻辑:
x_train
and x_test
will just be the index / column number that is associated with a sequence. x_train
和x_test
将只是与序列关联的索引/列号。 y_train
is an array of sequences. y_train
是序列的数组。 I've managed to get the shapes of each var when debugging, they are: 我设法在调试时获取每个var的形状,它们是:
x_train
: (37, 1) x_train
:( x_train
) x_test
: (13, 1) x_test
:( x_test
) y_train
: (37, 51) y_train
:( y_train
) y_test
: (13, 51) y_test
:( y_test
) But right now, running the program gives me this error: 但是现在,运行程序给我这个错误:
ValueError: bad input shape (37, 51)
What is my mistake here? 我这是什么错
I don't understand why are you using this: 我不明白您为什么使用这个:
x_train, x_test, y_train, y_test = train_test_split(
np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)
You have data here in df
. 您在df
有数据。 Extract X
and y
from it and then split it to train and test. 从中提取X
和y
,然后将其拆分以进行训练和测试。
Try this: 尝试这个:
X = df.iloc[:,:-1]
y = df.iloc[:, -1:]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
Otherwise, the stats you shared shows you are trying to have 51 columned output from one feature, which is weird if you think about it. 否则,您共享的统计信息表明您正试图从一项功能中获得51列输出,如果考虑一下,这是很奇怪的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.