I am currently trying to create a multiple linear regression with test and training data for estimating house prices (by using two regressors called "Quadratmeter" and "Gewinn").
I want to insert some test data into the model and compare the predicted y-values with the actual ones. Therefore, I used a for-loop to display them side-by-side.
This is the whole code used:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv("...")
head = df.head()
pd.set_option('display.expand_frame_repr', False)
print(head)
X = df[["Gewinn", "Quadratmeter"]]
y = df[["Preis in Mio"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.25)
model = LinearRegression()
model.fit(X_train, y_train)
print(model.intercept_)
print(model.coef_)
# complete model: Preis = 6.48370247 + Gewinn * 6.39855984e-06 + Quadratmeter * 3.89642288e-03 + e
y_test_pred = model.predict(X_test)
for i in range(0, len(y_test_pred)):
print(y_test_pred[i][0] + "-" + y_test[i][0])
Unfortunately, the following error code is issued when trying to run the program (which is due to the for loop):
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature `enter code here`matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')
Unfortunately, I don't know how to solve this issue. Could anyone provide a hint?
Any help is appreciated.
If you want to compare y_test_pred
and y_test
I'd suggest to create a dataframe continaing both values. That makes the comparison easier plus you can save the values and make calculations.
results_df = pd.DataFrame({'predictions': y_test_pred, 'actual_values':y_test})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.