[英]Support vector regression from sklearn gives flat prediction
我正在对这个数据集进行支持向量回归,我认为我启动的 SVR 可能与我调用变量或使用内核的方式有关。 MWE 的代码很长,所以我对它进行了大量评论并突出显示了相关部分
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scikit_posthocs as sp
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVR
#read in data https://archive.ics.uci.edu/ml/datasets/Forest+Fires
df = pd.read_csv('forestfires.csv')
df['transformed_area']=np.log10(df['area']+1) #transform area- the target varaible
df_ohe=pd.get_dummies(df,drop_first=True) # one hot encode catagorical variables
labels = np.array(df_ohe['transformed_area']) # seperate labels from what is being labeled
df_ohe =df_ohe.drop(['transformed_area','area'],axis=1)
# Split the data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(df_ohe,labels,test_size = 0.30, random_state = 42)
#I think it goes wrong after this point
# Scale training and testing sets
scaler = MinMaxScaler((-1,1))
scaler.fit(df_ohe)
rescaled_X_train =scaler.transform(X_train)
rescaled_X_test =scaler.transform(X_test)
#put back in feature names
rescaled_X_train = pd.DataFrame(rescaled_X_train,columns=list(X_train))
rescaled_X_test = pd.DataFrame(rescaled_X_test,columns=list(X_train))
# Instantiate model
svr = SVR()
# Train the model on training data
svr.fit(rescaled_X_train, Y_train)
当我绘制transformed_area的预测值与实际值时,我得到了这些图,即所有X数据的预测值都是恒定的。
我正在使用实例化模型/训练我习惯用于 sklearn 模型和调用数据的模型的方式。 SVR 还需要什么吗?
看起来您可能正在对未缩放的输入进行预测,而您应该使用缩放的输入进行预测(这就是您的模型所训练的内容)。 (下一次,包括预测绘图代码,因为这样会更容易发现。)
当我运行您的代码并生成预测( svr.predict(rescaled_X_train)
或svr.predict(rescaled_X_test)
)时,我得到的值范围介于 -0.07 和 1.33 之间,这与您的绘图不同。 当我尝试不缩放( svr.predict(X_train)
或svr.predict(X_test)
)进行预测时,我确实得到了接近 0.5 的基本恒定预测。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.