简体   繁体   English

随机森林回归是否适合这种回归问题?

[英]Is Random Forest regression is good for this kind of regression problem?

I am working with vehicle occupancy prediction and I am very much new to this, I have used random forest regression to predict the occupancy values.我正在研究车辆占用率预测,我对此非常陌生,我使用随机森林回归来预测占用率值。

Jupyter notebook_Random forest Jupyter notebook_随机森林

I have around 48 M rows and I have used all the data to predict the occupancy, As the population and occupancy were normalized due to the higher numbers and I have predicted.我有大约 4800 万行,我已经使用所有数据来预测占用率,因为人口和占用率由于较高的数字而被标准化,并且我已经预测。 I am sure the model is not good, how can I interpret the results from the RMSE and MAE.我确定模型不好,我如何解释 RMSE 和 MAE 的结果。 Also, the plot shows that it is not predicted well, Am I doing it in a correct way to predict the occupancy of the vehicles.此外,该图显示它没有被很好地预测,我是否以正确的方式来预测车辆的占用率。

Kindly help me with the following,请帮助我解决以下问题,

  1. Is Random forest regression is a good method to approach this problem?随机森林回归是解决这个问题的好方法吗?
  2. How can I improve the model results?如何改进模型结果?
  3. How to interpret the results from the outcome如何从结果中解释结果

You are getting RMSE of 0.002175863553610834 which is really close to zero.您得到的 RMSE 为0.002175863553610834 ,这非常接近于零。 So, we can say that you have a good model.所以,我们可以说你有一个很好的模型。 I don't think the model needs further improvement.我认为该模型不需要进一步改进。 If you still want to improve it, I think you should change the algorithm to XGBoost and use regularization and early stopping to avoid overfitting.如果你还想改进它,我认为你应该将算法更改为XGBoost并使用正则化和提前停止以避免过度拟合。

from xgboost import XGBRegressor

model = XGBRegressor(n_estimators = 3000, learning_rate = 0.01, reg_alpha = 2, reg_lambda = 1, n_jobs = -1, random_state = 34, verbosity = 0)
    
evalset = [(X_train, y_train), (X_test, y_test)]
model.fit(X_train, y_train, eval_metric = 'rmse', eval_set = evalset, early_stopping_rounds = 5)

向您推荐了基于 XGBoost 的回归器,因此您也可以尝试基于 LightGBM 的回归器: https ://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html

  1. Is Random forest regression is a good method to approach this problem?随机森林回归是解决这个问题的好方法吗?

    -> The model is just a tool and can of course be used. -> 模型只是一个工具,当然可以使用。 However, no one can answer whether it is suitable or not, because we have not studied the distribution of data.但是,是否合适没有人能回答,因为我们没有研究过数据的分布。 It is suggested that you can try logistic regression, support vector machine regression, etc.建议可以尝试逻辑回归、支持向量机回归等。

  2. How can I improve the model results?如何改进模型结果?

    -> I have several suggestions on how to improve: 1.Do not standardize without confirming whether the y value column has extreme values. -> 我有几点建议如何改进: 1.不确认y值列是否有极值就不要标准化。 2.When calculating RMSE and Mae, use the original y value. 2.计算RMSE和Mae时,使用原始y值。 3.Deeply understand business logic and add new features. 3.深入理解业务逻辑,增加新功能。 4.Learn about data processing and Feature Engineering on the blog. 4.在博客上了解数据处理和特征工程。

  3. How to interpret the results from the outcome如何从结果中解释结果

    -> Bad results do not necessarily mean no value. -> 糟糕的结果并不一定意味着没有价值。 You need to compare whether the model is better than the existing methods and whether it has produced more economic value.您需要比较该模型是否优于现有方法以及它是否产生了更多的经济价值。 For example, error is loss, and accuracy is gain.例如,错误是损失,准确率是增益。

Hope these can help you.希望这些能帮到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM