簡體   English   中英

用熊貓數據框查找均方根誤差

[英]Finding Root Mean Squared Error with Pandas dataframe

我正在嘗試從熊貓數據框中計算均方根誤差。 我已經檢查了先前有關堆積溢出的鏈接,例如python中的均方根錯誤和scikit學習文檔http://scikit-learn.org/stable/modules/genic/sklearn.metrics.mean_squared_error.html我希望有人這將使我了解我在做什么錯。 這是數據集 這是我的代碼。

import pandas as pd
import numpy as np
sales = pd.read_csv("home_data.csv")

from sklearn.cross_validation import train_test_split
train_data,test_data = train_test_split(sales,train_size=0.8)

from sklearn.linear_model import LinearRegression
X = train_data[['sqft_living']]
y=train_data.price
#build the linear regression object
lm=LinearRegression()
# Train the model using the training sets
lm.fit(X,y)
#print the y intercept
print(lm.intercept_)
#print the coefficents
print(lm.coef_)

lm.predict(300)



from math import sqrt
from sklearn.metrics import mean_squared_error
y_true=train_data.price.loc[0:5,]
test_data=test_data[['price']].reset_index()
y_pred=test_data.price.loc[0:5,]
predicted =y_pred.as_matrix()
actual= y_true.as_matrix()
mean_squared_error(actual, predicted)

編輯

所以這對我有用。 我不得不將平方英尺的測試數據集值從行轉換為列。

from sklearn.linear_model import LinearRegression
X = train_data[['sqft_living']]
y=train_data.price
#build the linear regression object
lm=LinearRegression()
# Train the model using the training sets
lm.fit(X,y)

新密碼

test_X = test_data.sqft_living.values
print(test_X)
print(np.shape(test_X))
print(len(test_X))
test_X = np.reshape(test_X, [4323, 1])
print(test_X)
from sklearn.metrics import mean_squared_error
from sklearn.metrics import explained_variance_score
MSE = mean_squared_error(y_true = test_data.price.values, y_pred = lm.predict(test_X))
MSE
MSE**(0.5)

您正在將測試集標簽與訓練集標簽進行比較。 我相信您真正想要做的是將測試集標簽與預測的測試集標簽進行比較。

例如:

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import train_test_split

sales = pd.read_csv("home_data.csv")
train_data, test_data = train_test_split(sales,train_size=0.8)

# Train the model
X = train_data[['sqft_living']]
y = train_data.price
lm = LinearRegression()
lm.fit(X, y)

# Predict on the test data
X_test = test_data[['sqft_living']]
y_test = test_data.price
y_pred = lm.predict(X_test)

# Compute the root-mean-square
rms = np.sqrt(mean_squared_error(y_test, y_pred))
print(rms)
# 260435.511036

請注意,scikit-learn通常可以處理Pandas DataFrames和Series輸入,而無需顯式轉換為numpy數組。 您問題中的代碼片段中的錯誤與以下事實有關:傳遞給mean_squared_error()的兩個數組的大小不同。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM