I'm working on forecasting the Monthly Average Precipitation of a geographical region in India (Assam and Meghalaya subdivision). For this purpose, I'm using the Monthly Average Air Temperature data and Monthly Averaged Relative Humidity data (which I extracted and averaged it spatially from the netCDF4 file for this geographical region present on the NOAA website) as the independent variables(predictors).
For the forecasting purpose, I want to model a linear regression with Precipitation as the dependent variable and "Air Temperature" and "Relative Humidity" data as the independent variables such that they're having a time-lagged effect in the regression.
The Linear regression equation should look like:
Please follow this link for the equation
Here, "Y" is Precipitation, "X" is Air Temperature and "Z" is Relative Humidity.
The sample "Training data" is as follows:
ID Time Precipitation Air_Temperature Relative_Humidity
1 1 1948-01-01 105 20.31194 81.64137
2 2 1948-02-01 397 21.21052 80.20120
3 3 1948-03-01 594 22.14363 81.94274
4 4 1948-04-01 2653 20.79417 78.89908
5 5 1948-05-01 7058 20.43589 82.99959
6 6 1948-06-01 5328 18.10059 77.91983
7 7 1948-07-01 4882 16.63936 76.25758
8 8 1948-08-01 3979 16.56065 76.89210
9 9 1948-09-01 2625 16.95542 76.80116
10 10 1948-10-01 2578 17.13323 75.62411
And a segment of "Test data" is as follows:
ID Time Precipitation Air_Temperature Relative_Humidity
1 663 2003-03-01 862 21.27210 79.77419
2 664 2003-04-01 1812 20.44042 79.42500
3 665 2003-05-01 1941 19.24267 79.57057
4 666 2003-06-01 4981 18.53784 80.67292
5 667 2003-07-01 4263 17.21581 79.97178
6 668 2003-08-01 2436 16.88686 81.37097
7 669 2003-09-01 2322 16.23134 77.63333
8 670 2003-10-01 2220 17.40589 81.14516
9 671 2003-11-01 131 19.01159 79.15000
10 672 2003-12-01 241 20.86234 79.05847
Any help would be highly appreciated. Thanks!
Reacting to your clarification in the comments, here is one of many ways to produce lagged variables, using the lag
function within dplyr
(I am also adding a new row here for later forecasting):
df %>%
add_row(ID = 11, Time = "1948-11-01") %>%
mutate(Air_Temperature_lagged = dplyr::lag(Air_Temperature, 1),
Relative_Humidity_lagged = dplyr::lag(Relative_Humidity, 1)) -> df.withlags
You can then fit a straightforward linear regression using lm
, with Precipitation
as your dependent variable and the lagged versions of the two other variables as the predictor:
precip.model <- lm(data = df.withlags, Precipitation ~ Air_Temperature_lagged + Relative_Humidity_lagged)
You could then apply your coefficients to your most recent values in Air_Temperature
and Relative_Humidity
to forecast the precipitation for November of 1948 using the predict
function.
predict(precip.model, newdata = df.withlags)
1 2 3 4 5 6 7 8 9 10 11
NA 2929.566 3512.551 3236.421 3778.742 2586.012 3473.482 3615.884 3426.378 3534.965 3893.255
The model's prediction is 3893.255
.
Note that this model will only allow you to forecast one time period into the future, since you don't have more information in your predictors.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.