简体   繁体   中英

Prediction with lm

I have the following data frame:

lm        mean resids          sd resids    resid 1    resid 2    resid 3    intercept   beta

1         0.000000e+00         6.2806844 -3.6261548   7.2523096 -3.6261548   103.62615  24.989340
2        -2.960595e-16         8.7515899 -5.0527328  10.1054656 -5.0527328   141.96786  -1.047323
3        -2.960595e-16         5.9138984 -3.4143908   6.8287817 -3.4143908   206.29046 -26.448694
4         3.700743e-17         0.5110845  0.2950748  -0.5901495  0.2950748   240.89801 -35.806642
5         7.401487e-16         6.6260504  3.8255520  -7.6511040  3.8255520   187.03479 -23.444762
6         5.921189e-16         8.7217431  5.0355007 -10.0710014  5.0355007    41.43239   3.138396
7         0.000000e+00         5.5269434  3.1909823  -6.3819645  3.1909823  -119.90628  27.817845
8        -1.480297e-16         1.0204260 -0.5891432   1.1782864 -0.5891432  -180.33773  35.623363
9        -5.921189e-16         6.9488186 -4.0119023   8.0238046 -4.0119023   -64.72245  21.820226
10       -8.881784e-16         8.6621512 -5.0010953  10.0021906 -5.0010953   191.65339  -5.218767

Each row represents an estimated linear model with window length 3. I used rollapply on a separate dataframe with the function lm(y~t) to extract the coefficients and intercepts into a new dataframe, which I have combined with the residuals from the same model and their corresponding means and residuals.

Since the window length is 3, it implies that there are 3 residuals as shown, per model, in resid 1, resid 2 and resid 3. The mean and sd of these are included accordingly.

I am seeking to predict the next observation, in essence, k+1 , where k is the window length, using the intercept and beta.

Recall that lm1 takes observations 1,2,3 to estimate the intercept and the beta, and lm2 takes 2,3,4, lm3 takes 3,4,5, etc. The function for the prediction should be:

predict_lm1 = intercept_lm1 + beta_lm1*(k+1) 

Where k+1 = 4 . For lm2 :

predict_lm2 = intercept_lm2 + beta_lm2*(k+1)

Where k+1 = 5 .

Clearly, k increases by 1 every time I move down one row in the dataset. This is because the explanatory variable is time, t , which is a sequence increasing by one per observation.

Should I use a for loop , or an apply function here?

How can I make a function that iterates down the rows and calculates the predictions accordingly with the information found in that row?

Thanks.

EDIT:

I managed to find a possible solution by writing the following:

n=nrow(dataset)
for(i in n){
predictions = dataset$Intercept + dataset$beta*(k+1)
}

However, k does not increase by 1 per iteration. Thus, k+1 is always = 4 . How can I make sure k increases by 1 accordingly?

EDIT 2

I managed to add 1 to k by writing the following:

n=nrow(dataset)
for(i in n){
x = 0
x[i] = k + 1
preds = dataset$`(Intercept)` + dataset$t*(x[i])
}

However, the first prediction is overestimated. It should be 203, whereas it is estimated as 228, implying that it sets the explanatory variable as 1 too high. Yet, the second prediction is correct. I am not sure what I am doing wrong. Any advice?

EDIT 3

I managed to find a solution as follows:

n=nrow(dataset)

for(i in n){
 x = k + 1
 preds = dataset$`(Intercept)` + dataset$t*(x)
 x = x + 1
 }

Your loop is not iterating:

dataset <- read.table(text="lm        meanresids          sdresids    resid1    resid2    resid3    intercept   beta
1         0.000000e+00         6.2806844 -3.6261548   7.2523096 -3.6261548   103.62615  24.989340
2        -2.960595e-16         8.7515899 -5.0527328  10.1054656 -5.0527328   141.96786  -1.047323
3        -2.960595e-16         5.9138984 -3.4143908   6.8287817 -3.4143908   206.29046 -26.448694
4         3.700743e-17         0.5110845  0.2950748  -0.5901495  0.2950748   240.89801 -35.806642
5         7.401487e-16         6.6260504  3.8255520  -7.6511040  3.8255520   187.03479 -23.444762
6         5.921189e-16         8.7217431  5.0355007 -10.0710014  5.0355007    41.43239   3.138396
7         0.000000e+00         5.5269434  3.1909823  -6.3819645  3.1909823  -119.90628  27.817845
8        -1.480297e-16         1.0204260 -0.5891432   1.1782864 -0.5891432  -180.33773  35.623363
9        -5.921189e-16         6.9488186 -4.0119023   8.0238046 -4.0119023   -64.72245  21.820226
10       -8.881784e-16         8.6621512 -5.0010953  10.0021906 -5.0010953   191.65339  -5.218767", header=T)

n <- nrow(dataset)
predictions <- data.frame()
for(i in 1:n){
  k <- i ##not sure where k is coming from but put it here
  predictions <- rbind(predictions, dataset$intercept[i] + dataset$beta[i]*(k+1))
}
predictions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM