简体   繁体   English

使用 Python 对面板数据进行随机森林

[英]Random Forest on Panel Data using Python

So I am having some troubles running a random forest regression on panel data.所以我在面板数据上运行随机森林回归时遇到了一些麻烦。

The data currently looks like this:目前的数据如下所示:

在此处输入图像描述

I want to conduct a random forest regression which predicts KwH for each ID over time based on the variables I have.我想进行随机森林回归,根据我拥有的变量预测每个 ID 随时间变化的 KwH。 I have split my data into training and test samples using the following code:我已使用以下代码将数据拆分为训练和测试样本:

from sklearn.model_selection import train_test_split
X = df[['hour', 'day', 'month', 'dayofweek', 'apparentTemperature',
       'summary', 'household_size', 'work_from_home', 'num_rooms',
       'int_in_renew', 'int_in_gen', 'conc_abt_cc', 'feel_abt_lifestyle',
       'smrt_meter_help', 'avg_gender', 'avg_age', 'house_type', 'sum_insul',
       'total_lb', 'total_fridges', 'bigg_apps', 'small_apps',
       'look_at_meter']]
y = df[['KwH']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

I then wish to train my model and test it against the testing sample however I am unsure of how to do this.然后我希望训练我的 model 并针对测试样本进行测试,但是我不确定如何执行此操作。 I have tried this code:我试过这段代码:

from sklearn.ensemble import RandomForestRegressor
rfc = RandomForestRegressor(n_estimators=200)
rfc.fit(X_train, y_train)

However I get the following error message:但是我收到以下错误消息:

A column-vector y was passed when a 1d array was expected.当需要一维数组时,传递了列向量 y。 Please change the shape of y to (n_samples,), for example using ravel().请将 y 的形状更改为 (n_samples,),例如使用 ravel()。

Im not sure if the error is fundamentally in the way my data is arranged or the way I am doing the random forest so any help with this and then testing the data against the test sample after would be greatly appreciated.我不确定这个错误是否从根本上是我的数据排列方式或我做随机森林的方式,所以对此有任何帮助,然后在之后针对测试样本测试数据,将不胜感激。

Thanks in advance.提前致谢。

Simply switching y = df[['KwH']] to y = df['KwH'] or y = df.KwH should solve this.只需将y = df[['KwH']]切换为y = df['KwH']y = df.KwH解决此问题。

This is because scikit-learn doesn't expect y to be a dataframe, and selecting columns with the double [[...]] precisely is returning a dataframe.这是因为scikit-learn不希望y成为 dataframe,并且选择带有双[[...]]的列恰好返回 dataframe。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM