Python SKLearn：预测序列时出现“输入形状错误”错误

Question

I have an Excel file that stores a sequence in each column (reading from top cell to bottom cell), and the trend of the sequence is similar to the previous column. 我有一个Excel文件，该文件在每个列中存储一个序列（从顶部单元格到底部单元格读取），并且该序列的趋势类似于上一列。 So I'd like to predict the sequence for the nth column in this dataset. 因此，我想预测此数据集中第n列的顺序。

A sample of my data set: 我的数据集样本：

See that each column has a set of values / sequence, and they sort of progress as we move to the right, so I want to predict eg the values in the Z column. 看到每个列都有一组值/序列，并且随着我们向右移动它们会有所进展，因此我想预测例如Z列中的值。

Here's my code so far: 到目前为止，这是我的代码：

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Read the Excel file in rows
df = pd.read_excel(open('vec_sol2.xlsx', 'rb'),
                header=None, sheet_name='Sheet1')
print(type(df))
length = len(df.columns)
# Get the sequence for each row

x_train, x_test, y_train, y_test = train_test_split(
    np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

print("y_train shape: ", y_train.shape)

pred_model = LogisticRegression()
pred_model.fit(x_train, y_train)
print(pred_model)

I'll explain the logic as much as possible: 我将尽可能解释逻辑：

x_train and x_test will just be the index / column number that is associated with a sequence. x_train和x_test将只是与序列关联的索引/列号。
y_train is an array of sequences. y_train是序列的数组。
There is a total of 51 columns, so splitting it with 25% being test data results in 37 train sequences and 13 test sequences. 总共有51列，因此将其拆分为25％的测试数据可得到37个训练序列和13个测试序列。

I've managed to get the shapes of each var when debugging, they are: 我设法在调试时获取每个var的形状，它们是：

x_train : (37, 1) x_train ：（ x_train ）
x_test : (13, 1) x_test ：（ x_test ）
y_train : (37, 51) y_train ：（ y_train ）
y_test : (13, 51) y_test ：（ y_test ）

But right now, running the program gives me this error: 但是现在，运行程序给我这个错误：

ValueError: bad input shape (37, 51)

What is my mistake here? 我这是什么错

Answer 1

I don't understand why are you using this: 我不明白您为什么使用这个：

x_train, x_test, y_train, y_test = train_test_split(
np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

You have data here in df . 您在df有数据。 Extract X and y from it and then split it to train and test. 从中提取X和y ，然后将其拆分以进行训练和测试。

Try this: 尝试这个：

X = df.iloc[:,:-1]
y = df.iloc[:, -1:]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

Otherwise, the stats you shared shows you are trying to have 51 columned output from one feature, which is weird if you think about it. 否则，您共享的统计信息表明您正试图从一项功能中获得51列输出，如果考虑一下，这是很奇怪的。

Python SKLearn：预测序列时出现“输入形状错误”错误

问题描述

1 个解决方案

解决方案1
0 2018-11-05 13:25:48

Python SKLearn：预测序列时出现“输入形状错误”错误

问题描述

1 个解决方案

解决方案1 0 2018-11-05 13:25:48

解决方案1
0 2018-11-05 13:25:48