简体   繁体   中英

Why am I getting the error: 'ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required' when running my code?

I found a sample code on scikit-learn for running a simple regression model. I have the following code:

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

#Define the path for the file
path=r"C:\Users\H\Desktop\Files\Sampledata.xlsx"

#Read the file into a dataframe ensuring to group by weeks
df=pd.read_excel(path, sheet_name = 0)
df=df.groupby(['Week']).sum()
df = df.reset_index()

#Define x and y
X=df[['Week']]
y=df['Payment Amount Total']

# Split the data into training/testing sets
X_train = X[:-20]
X_test = X[-20:]

# Split the targets into training/testing sets
y_train = y[:-20]
y_test = y[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = regr.predict(X_test)

# The coefficients
print("Coefficients: \n", regr.coef_)
# The mean squared error
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: %.2f" % r2_score(y_test, y_pred))

# Plot outputs
plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

Everything is okay until I get to the following line:

regr.fit(X_train, y_train)

When I run this line I get the following error:

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.

I'm not sure why I am getting this error.

This is what I get when I try to print the dataframes:

print(y_train)
print(y_test)
print(X_train)
print(X_test)
0      86546.38
1      83756.20
2      84961.71
3      70918.25
4      88554.72
5      90256.68
6      87202.41
7      75824.41
8      89538.01
9      88480.46
10     94556.43
11     91835.87
12     93248.96
13     94887.17
14     98587.10
15     96398.35
16     94210.16
17    100156.39
18     97870.76
19    103892.86
Name: Payment Amount Total, dtype: float64
Series([], Name: Payment Amount Total, dtype: float64)
0      86546.38
1      83756.20
2      84961.71
3      70918.25
4      88554.72
5      90256.68
6      87202.41
7      75824.41
8      89538.01
9      88480.46
10     94556.43
11     91835.87
12     93248.96
13     94887.17
14     98587.10
15     96398.35
16     94210.16
17    100156.39
18     97870.76
19    103892.86
Name: Payment Amount Total, dtype: float64
Empty DataFrame
Columns: [Week]
Index: []


       Week
    0      3
    1      4
    2      5
    3      6
    4      7
    5      8
    6      9
    7     10
    8     11
    9     12
    10    13
    11    14
    12    15
    13    16
    14    17
    15    18
    16    19
    17    20
    18    21
    19    22

It should be noted that X actually prints values:

 Week
0      3
1      4
2      5
3      6
4      7
5      8
6      9
7     10
8     11
9     12
10    13
11    14
12    15
13    16
14    17
15    18
16    19
17    20
18    21
19    22

I managed to solve the issue by doing the following before deploying the train/test method:

#features should be converted into a numpy array

X=df['Week'].values

#It should then be reshaped

X=X.reshape(-1,1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM