当每个季节有不同的模型时预测价格

Question

I have a dataset with many columns:我有一个包含许多列的数据集：

There are 4 variables used for prediction: -season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance ( in KM) -first_time_customer ( true, false)有 4 个变量用于预测：-season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance (in KM) -first_time_customer (true, false)

These 4 variables are used to calculate the shipping_price, with the following rule, for each season, there is a separate model that uses the above mentioned variables.这4个变量用于计算shipping_price，按照以下规则，对于每个季节，都有一个单独的模型使用上述变量。

I have used an approach where, I converted True to 1 and False to 0 for the 2 Boolean columns I also converted the season in to an integer representation (1,2,3,4)我使用了一种方法，我将 2 个布尔列的 True 转换为 1 并将 False 转换为 0 我还将季节转换为整数表示形式 (1,2,3,4)

The problem is my predictions are wildly inaccurate, here is the code i am using问题是我的预测非常不准确，这是我正在使用的代码

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
modeling = data.loc[:,["shipping_distance","season_int","new_cust_int","express_shipment","shipping_charge"]]
x =modeling.iloc[:,:-1]
y =modeling.iloc[:,-1:]
X_train, X_test, y_train, y_test = train_test_split(x,y, random_state = 1)
model = LinearRegression()
model.fit(X_train, y_train)
model.predict(X_test)

Is anyone able to explain what the correct approach to this problem is, and or how to solve it?有没有人能够解释解决这个问题的正确方法是什么，或者如何解决它？

Answer 1

Here you use label encoder for "season_int" (1,2,3,4) and the linear regression.在这里，您将标签编码器用于“season_int”（1,2,3,4）和线性回归。 That means you assign the "season_int" some intrinsic order for this model.这意味着您为此模型分配了“season_int”一些内在顺序。 You could try one hot encoding for "season_int":您可以为“season_int”尝试一种热编码：

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Answer 2

Possible answers:可能的答案：

You are using categorical variables for linear regression, which might be an issue.您正在将分类变量用于线性回归，这可能是一个问题。 Here are possible solutions. 以下是可能的解决方案。
LinearRegression might not be the best model for your problem, since your problem might not be linear. LinearRegression 可能不是您问题的最佳模型，因为您的问题可能不是线性的。 Try non-linear models such as sklearn.ensemble.RandomForestRegressor for example.尝试非线性模型，例如sklearn.ensemble.RandomForestRegressor 。
Your dataset might not be valuable enough for the problem you are trying to solve.您的数据集可能不适合你正在试图解决这个问题有价值就够了。 The variables might not be the best ones to determine the price etc.变量可能不是确定价格等的最佳变量。
You don't have enough data to train your model.您没有足够的数据来训练模型。

Answer 3

似乎您想要一个时间序列模型 [是吗？] https://www.statsmodels.org/stable/examples/index.html#time-series-analysis

当每个季节有不同的模型时预测价格

问题描述

3 个解决方案

解决方案1
0 2020-10-15 03:41:49

解决方案2
0 2020-10-15 07:49:07

解决方案3
-1 2020-10-15 03:48:04

当每个季节有不同的模型时预测价格

问题描述

3 个解决方案

解决方案1 0 2020-10-15 03:41:49

解决方案2 0 2020-10-15 07:49:07

解决方案3 -1 2020-10-15 03:48:04

解决方案1
0 2020-10-15 03:41:49

解决方案2
0 2020-10-15 07:49:07

解决方案3
-1 2020-10-15 03:48:04