简体   繁体   English

当每个季节有不同的模型时预测价格

[英]predicting price when each season has different model

I have a dataset with many columns:我有一个包含许多列的数据集:

There are 4 variables used for prediction: -season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance ( in KM) -first_time_customer ( true, false)有 4 个变量用于预测:-season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance (in KM) -first_time_customer (true, false)

These 4 variables are used to calculate the shipping_price, with the following rule, for each season, there is a separate model that uses the above mentioned variables.这4个变量用于计算shipping_price,按照以下规则,对于每个季节,都有一个单独的模型使用上述变量。

I have used an approach where, I converted True to 1 and False to 0 for the 2 Boolean columns I also converted the season in to an integer representation (1,2,3,4)我使用了一种方法,我将 2 个布尔列的 True 转换为 1 并将 False 转换为 0 我还将季节转换为整数表示形式 (1,2,3,4)

The problem is my predictions are wildly inaccurate, here is the code i am using问题是我的预测非常不准确,这是我正在使用的代码

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
modeling = data.loc[:,["shipping_distance","season_int","new_cust_int","express_shipment","shipping_charge"]]
x =modeling.iloc[:,:-1]
y =modeling.iloc[:,-1:]
X_train, X_test, y_train, y_test = train_test_split(x,y, random_state = 1)
model = LinearRegression()
model.fit(X_train, y_train)
model.predict(X_test)

Is anyone able to explain what the correct approach to this problem is, and or how to solve it?有没有人能够解释解决这个问题的正确方法是什么,或者如何解决它?

Here you use label encoder for "season_int" (1,2,3,4) and the linear regression.在这里,您将标签编码器用于“season_int”(1,2,3,4)和线性回归。 That means you assign the "season_int" some intrinsic order for this model.这意味着您为此模型分配了“season_int”一些内在顺序。 You could try one hot encoding for "season_int":您可以为“season_int”尝试一种热编码:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Possible answers:可能的答案:

  • You are using categorical variables for linear regression, which might be an issue.您正在将分类变量用于线性回归,这可能是一个问题。 Here are possible solutions. 以下是可能的解决方案。
  • LinearRegression might not be the best model for your problem, since your problem might not be linear. LinearRegression 可能不是您问题的最佳模型,因为您的问题可能不是线性的。 Try non-linear models such as sklearn.ensemble.RandomForestRegressor for example.尝试非线性模型,例如sklearn.ensemble.RandomForestRegressor
  • Your dataset might not be valuable enough for the problem you are trying to solve.您的数据集可能不适合你正在试图解决这个问题有价值就够了。 The variables might not be the best ones to determine the price etc.变量可能不是确定价格等的最佳变量。
  • You don't have enough data to train your model.您没有足够的数据来训练模型。

似乎您想要一个时间序列模型 [是吗?] https://www.statsmodels.org/stable/examples/index.html#time-series-analysis

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用回归数据模型预测价格 - Predicting price using regression data model 预测 LSTM model 的股价结果显着高于预期 - Significantly higher outcomes in stock price predicting LSTM model than expected 预测时具有不同时间步长的 LSTM - LSTM with different timestep when predicting Vertex AI 似乎认为部署的 model 输入形状与在本地预测时不同 - Vertex AI seems to think a deployed model input shape is different then when predicting locally 预测新结果时检查模型输入角点时出错 - Error when checking model input keras when predicting new results 如何计算大熊猫数据框中每个季节有多少天等于不同数量的答案? - How to calculate how many days of each season have answer equal to different numbers in pandas data frame? 使用 LSTM 的多变量 Keras 预测模型:预测时使用哪个索引? - Multivariate Keras Prediction Model With LSTM: Which index is used when predicting? 使用烧瓶加载和预测keras模型时的最佳实践 - Best practice when loading and predicting keras model using flask 当模型处于预测阶段时如何获取输入数据形状? - How to get input data shape when model is in predicting phase? 预测张量流模型 - Predicting the tensorflow model
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM