简体   繁体   中英

Incorporating random effects into time series data for random forest regression

I have time series air pollution data (eg PM2.5, CO2, temp, outdoor PM2.5) from three residences, and activity diaries recorded by the residents in binary format (eg cooking, 1 when activity is taking place and 0 when it is not) and I want to incorporate data from all three locations into a random forest prediction model for PM2.5 with the main goal of seeing which activities are most strongly predictive of the PM2.5 levels.

I am able to model these residences separately but am currently trying to work out a way to incorporate all three in one model. I have thought of trying to apply some sort of random effects where each of the residences is a group of data, but I am unsure how to implement this in R and get data that could then be applied to the RF.

Essentially, my question is how can I include time series data from three residences over the same variables (except from the external air pollution measurement which is unique to each house) into one model, accounting for the variation between houses in each of their respective explanatory variables?

The REEMtree R package combines the structure of mixed effects model with tree-based estimation methods. There's a paper on it published here: https://link.springer.com/article/10.1007/s10994-011-5258-3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM