简体   繁体   中英

Linear regression using time lagged predictors (independent variables) for forecasting purpose

I'm working on forecasting the Monthly Average Precipitation of a geographical region in India (Assam and Meghalaya subdivision). For this purpose, I'm using the Monthly Average Air Temperature data and Monthly Averaged Relative Humidity data (which I extracted and averaged it spatially from the netCDF4 file for this geographical region present on the NOAA website) as the independent variables(predictors).

For the forecasting purpose, I want to model a linear regression with Precipitation as the dependent variable and "Air Temperature" and "Relative Humidity" data as the independent variables such that they're having a time-lagged effect in the regression.

The Linear regression equation should look like:

Please follow this link for the equation

Here, "Y" is Precipitation, "X" is Air Temperature and "Z" is Relative Humidity.

The sample "Training data" is as follows:

   ID       Time Precipitation Air_Temperature Relative_Humidity
1   1 1948-01-01           105        20.31194          81.64137
2   2 1948-02-01           397        21.21052          80.20120
3   3 1948-03-01           594        22.14363          81.94274
4   4 1948-04-01          2653        20.79417          78.89908
5   5 1948-05-01          7058        20.43589          82.99959
6   6 1948-06-01          5328        18.10059          77.91983
7   7 1948-07-01          4882        16.63936          76.25758
8   8 1948-08-01          3979        16.56065          76.89210
9   9 1948-09-01          2625        16.95542          76.80116
10 10 1948-10-01          2578        17.13323          75.62411

And a segment of "Test data" is as follows:

        ID       Time Precipitation Air_Temperature Relative_Humidity
    1  663 2003-03-01           862        21.27210          79.77419
    2  664 2003-04-01          1812        20.44042          79.42500
    3  665 2003-05-01          1941        19.24267          79.57057
    4  666 2003-06-01          4981        18.53784          80.67292
    5  667 2003-07-01          4263        17.21581          79.97178
    6  668 2003-08-01          2436        16.88686          81.37097
    7  669 2003-09-01          2322        16.23134          77.63333
    8  670 2003-10-01          2220        17.40589          81.14516
    9  671 2003-11-01           131        19.01159          79.15000
    10 672 2003-12-01           241        20.86234          79.05847

Any help would be highly appreciated. Thanks!

Reacting to your clarification in the comments, here is one of many ways to produce lagged variables, using the lag function within dplyr (I am also adding a new row here for later forecasting):

df %>%
   add_row(ID = 11, Time = "1948-11-01") %>%
   mutate(Air_Temperature_lagged = dplyr::lag(Air_Temperature, 1),
          Relative_Humidity_lagged = dplyr::lag(Relative_Humidity, 1)) -> df.withlags

You can then fit a straightforward linear regression using lm , with Precipitation as your dependent variable and the lagged versions of the two other variables as the predictor:

precip.model <- lm(data = df.withlags, Precipitation ~ Air_Temperature_lagged + Relative_Humidity_lagged)

You could then apply your coefficients to your most recent values in Air_Temperature and Relative_Humidity to forecast the precipitation for November of 1948 using the predict function.

predict(precip.model, newdata = df.withlags)
  1        2        3        4        5        6        7        8        9       10       11 
  NA 2929.566 3512.551 3236.421 3778.742 2586.012 3473.482 3615.884 3426.378 3534.965 3893.255 

The model's prediction is 3893.255 .

Note that this model will only allow you to forecast one time period into the future, since you don't have more information in your predictors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM