[英]predicting price when each season has different model
I have a dataset with many columns:我有一个包含许多列的数据集:
There are 4 variables used for prediction: -season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance ( in KM) -first_time_customer ( true, false)有 4 个变量用于预测:-season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance (in KM) -first_time_customer (true, false)
These 4 variables are used to calculate the shipping_price, with the following rule, for each season, there is a separate model that uses the above mentioned variables.这4个变量用于计算shipping_price,按照以下规则,对于每个季节,都有一个单独的模型使用上述变量。
I have used an approach where, I converted True to 1 and False to 0 for the 2 Boolean columns I also converted the season in to an integer representation (1,2,3,4)我使用了一种方法,我将 2 个布尔列的 True 转换为 1 并将 False 转换为 0 我还将季节转换为整数表示形式 (1,2,3,4)
The problem is my predictions are wildly inaccurate, here is the code i am using问题是我的预测非常不准确,这是我正在使用的代码
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
modeling = data.loc[:,["shipping_distance","season_int","new_cust_int","express_shipment","shipping_charge"]]
x =modeling.iloc[:,:-1]
y =modeling.iloc[:,-1:]
X_train, X_test, y_train, y_test = train_test_split(x,y, random_state = 1)
model = LinearRegression()
model.fit(X_train, y_train)
model.predict(X_test)
Is anyone able to explain what the correct approach to this problem is, and or how to solve it?有没有人能够解释解决这个问题的正确方法是什么,或者如何解决它?
Here you use label encoder for "season_int" (1,2,3,4) and the linear regression.在这里,您将标签编码器用于“season_int”(1,2,3,4)和线性回归。 That means you assign the "season_int" some intrinsic order for this model.这意味着您为此模型分配了“season_int”一些内在顺序。 You could try one hot encoding for "season_int":您可以为“season_int”尝试一种热编码:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
Possible answers:可能的答案:
sklearn.ensemble.RandomForestRegressor
for example.尝试非线性模型,例如sklearn.ensemble.RandomForestRegressor
。似乎您想要一个时间序列模型 [是吗?] https://www.statsmodels.org/stable/examples/index.html#time-series-analysis
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.