I found a sample code on scikit-learn for running a simple regression model. I have the following code:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
#Define the path for the file
path=r"C:\Users\H\Desktop\Files\Sampledata.xlsx"
#Read the file into a dataframe ensuring to group by weeks
df=pd.read_excel(path, sheet_name = 0)
df=df.groupby(['Week']).sum()
df = df.reset_index()
#Define x and y
X=df[['Week']]
y=df['Payment Amount Total']
# Split the data into training/testing sets
X_train = X[:-20]
X_test = X[-20:]
# Split the targets into training/testing sets
y_train = y[:-20]
y_test = y[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(X_train, y_train)
# Make predictions using the testing set
y_pred = regr.predict(X_test)
# The coefficients
print("Coefficients: \n", regr.coef_)
# The mean squared error
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: %.2f" % r2_score(y_test, y_pred))
# Plot outputs
plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
Everything is okay until I get to the following line:
regr.fit(X_train, y_train)
When I run this line I get the following error:
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.
I'm not sure why I am getting this error.
This is what I get when I try to print the dataframes:
print(y_train)
print(y_test)
print(X_train)
print(X_test)
0 86546.38
1 83756.20
2 84961.71
3 70918.25
4 88554.72
5 90256.68
6 87202.41
7 75824.41
8 89538.01
9 88480.46
10 94556.43
11 91835.87
12 93248.96
13 94887.17
14 98587.10
15 96398.35
16 94210.16
17 100156.39
18 97870.76
19 103892.86
Name: Payment Amount Total, dtype: float64
Series([], Name: Payment Amount Total, dtype: float64)
0 86546.38
1 83756.20
2 84961.71
3 70918.25
4 88554.72
5 90256.68
6 87202.41
7 75824.41
8 89538.01
9 88480.46
10 94556.43
11 91835.87
12 93248.96
13 94887.17
14 98587.10
15 96398.35
16 94210.16
17 100156.39
18 97870.76
19 103892.86
Name: Payment Amount Total, dtype: float64
Empty DataFrame
Columns: [Week]
Index: []
Week
0 3
1 4
2 5
3 6
4 7
5 8
6 9
7 10
8 11
9 12
10 13
11 14
12 15
13 16
14 17
15 18
16 19
17 20
18 21
19 22
It should be noted that X actually prints values:
Week
0 3
1 4
2 5
3 6
4 7
5 8
6 9
7 10
8 11
9 12
10 13
11 14
12 15
13 16
14 17
15 18
16 19
17 20
18 21
19 22
I managed to solve the issue by doing the following before deploying the train/test method:
#features should be converted into a numpy array
X=df['Week'].values
#It should then be reshaped
X=X.reshape(-1,1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.