简体   繁体   中英

Time Series and Linear Regression

this may seem like an unsual Question but I really need your help. I'm completely new to TIME-SERIES Analyses but have a sufficient understanding of OLS Regressions. Firstly, I managed to convert an object to a zoo object with quarter frequencies.

  1. Question: The zoo object I created is no proper dataset object and for some reasons I cannot use any $-specification when wanting to use Variables from within the zoo object. Is there any way I can transform the zoo object back into a dataset object or does that make no sense?
  2. Question: I used to fill missing quarter rows. However, I got values of 0.25 and 0.5 for a dummy variable which obviously does not make any sense. Any possibility to change that?
  3. Question: the zoo object droped the quarter variables, so I don't have any time dimension in the zoo object anymore. Are there any commands to do fix that?
na.approx(as.ts(z)) 
  1. Question: This is probably the most important issue. I watched a few videos on how to use a regression in time series, but I genuinly do not understand what the difference is between using the standard lm-function on variables which have high autocorrelation (time-series) and objects which don't (OLS). I know that for OLS the assumption is, that ouptvariables are uncorrelated. But let's say I want to calculate the regression coefficients for GDPGROWTH and the average consensual voting behaviour in parliament ("AverageCONS"). Both Variables have high auto-correlation and I have a dataframe where dates are assigned to each of these variables.

What I don't understand is how to get a "Time-series" regression with using the standard lm-command. Especially because when I want to use the following command:

lm(z$GDPGROWTH~z$APPROVALGOV)

I can't access the zoo object (or time-series object) because the $ parameter does not work.

Fehler in z$GDPGrowth : $ operator is invalid for atomic vectors

So I have to resort to using the variables from the normal dataset object. But then that wouldn't incooporate any time dimension right?

Generally speaking, I'm extremly confused on about time-series and how regression analyses incoperate the time dimension. What I want to get as an Analysis Result is a Regression between GDP Growth and the average consensual voting behaviour ("GDPGrowth" and "AverageCONS"). I know that both variables are autocorrelated via the time dimension. However, I don't know how to do a proper Time-Series Regression.

I'll post a dput and other reproducing codes to make everything a little easier for you. Grateful for any help!

> dput(z)
structure(c(1.2, -0.2, -0.15, -0.1, 0.4, 0.333333333333333, 0.266666666666667, 
0.2, 0.5, 0.8, 1.1, 1.4, 1.3, 2, 2.7, 0.8, 0.9, 1, 0.8, 0.6, 
-0.6, -0.0666666666666667, 0.466666666666667, 1, 1.6, 2.2, 1.9, 
1.6, 1.7, 1.8, 1.5, 1.2, 0.8, 0.4, 0.8, 1.2, 2.1, 0.5, 1.15, 
1.8, 0.65, -0.5, 0.4, 1.3, 0.3, -0.7, -0.5, -0.3, -0.15, 0, 0.6, 
-0.1, 1.4, 0.3, 0.7, 0.2, -0.3, 0.8, 0.2, 0, -0.8, 1.4, 1.05, 
0.7, -0.5, 1.3, 1, 0.7, 0.15, -0.4, -0.4, -0.4, 1.2, 0, 0.4, 
0.8, 1.4, 0.8, 0, -0.3, 2, 0, -0.2, -0.2, -0.5, 0, 0.5, -0.2, 
-1.5, -0.35, 0.8, 0.3, -0.2, 0.5, 0.2, -0.1, 0, 0, 1.17870603993396, 
0.589353019966981, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.61936244127144, 
2.01801455396906, 0.416666666666667, 0, 2.16353957620116, 4.32707915240231, 
5.98174400514746, 7.6364088578926, 0.257076257076257, 0.171384171384171, 
0.0856920856920857, 0, 11.2103879729705, 22.4207759459411, 15.2347455885415, 
8.04871523114194, 11.3521305960255, 14.6555459609091, 15.4403121270985, 
16.2250782932878, 8.6979606817534, 1.17084307021898, 3.97895588789713, 
6.78706870557528, 0, 0, 4.87695592673415, 9.7539118534683, 4.87695592673415, 
0, 0, 0, 0, 0, 0.0201047586252623, 0.0402095172505245, 0.0201047586252623, 
0, 0, 0.0636265006342972, 0, 0.171974252305606, 0, 5.57623701563216, 
11.1524740312643, 2.68040672020172, 6.2111801242236, 3.24760735460988, 
28.2976799963101, 0, 3.7866135488981, 7.5732270977962, 0, 0, 
0.747061391598759, 1.49412278319752, 35.8503062293569, 70.2064896755162, 
52.105350122636, 34.0042105697558, 18.5823772614653, 18.0896275972026, 
13.25206168539, 8.41449577357745, 10, 0, 0, 34.7491138493683, 
8.36236933797909, 39.6563615833003, 74.4262295081967, 22.3611248302746, 
10, 16.455880420063, 22.911760840126, 0, 0.0666722800439236, 
0.0333361400219618, 0, 50.3843726943174, 0, 0, 0.864549845643277, 
1.72909969128655, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0.5, 0.75, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 7.88127469343306, 7.67671239967451, 7.50492078119126, 
7.33312916270801, 7.26681550104175, 7.37045715962304, 7.47409881820433, 
7.57774047678561, 7.54440688341081, 7.511073290036, 7.47773969666119, 
7.44440610328638, 7.51950710910292, 7.47703037780537, 7.43455364650782, 
7.21735906430465, 7.07654252150171, 6.93572597869877, 7.13426907080346, 
7.33281216290814, 7.34110311728404, 7.50058012443653, 7.66005713158903, 
7.81953413874152, 7.80295976635758, 7.78638539397364, 7.67221491798127, 
7.5580444419889, 7.13119115282123, 6.70433786365357, 6.73565901224693, 
6.76698016084029, 6.8801068632748, 6.9932335657093, 7.44235686960608, 
7.89148017350285, 8.24705859843768, 8.20161269191644, 7.1659119101912, 
6.13021112846596, 6.7880423795211, 7.44587363057623, 7.46757903053749, 
7.48928443049876, 7.07702561011456, 6.66476678973035, 6.57551271762131, 
6.48625864551226, 6.31270117005075, 6.13914369458924, 6.05634679973895, 
5.9702909369734, 6.19216550005443, 6.87967122943963, 7.49940214266322, 
7.29465702788788, 7.08991191311255, 7.3351806925688, 7.46762039999888, 
7.22336518577119, 6.75192112299076, 6.61614229895973, 6.50505157543993, 
6.39396085192013, 6.09682321355397, 5.99711627005931, 5.97786567725074, 
5.95861508444216, 6.13656719089965, 6.31451929735713, 6.19389219496854, 
6.07326509257996, 7.58677238161551, 7.24041796080827, 6.96794618067372, 
6.69547440053917, 7.72437292977251, 7.61985191697131, 7.85861327446016, 
7.78974162557168, 8.00182694049075, 7.82019060836613, 7.58946475073855, 
7.89751118735182, 7.14978411180804, 7.30379876152907, 7.4578134112501, 
6.24455242517448, 5.8823776113788, 5.98170501192132, 6.08103241246385, 
5.73879250743035, 5.68128370028589, 5.83312282222293, 6.1665241256249, 
6.49992542902688, 6.45920800878159), .Dim = c(97L, 4L), .Dimnames = list(
    NULL, c("GDPGrowth", "AverageCONS", "BRGOVMEHR", "ApprovalGOV"
    )), .Tsp = c(2482, 2506, 4), class = c("mts", "ts", "matrix"
))

"FINAL" is the overall aggregated dataframe from which I would like to do the Time-Series analysis from.

> dput(FINAL)
structure(list(Quarter.y = c(1981.1, 1981.2, 1981.4, 1982.1, 
1982.4, 1983.4, 1984.1, 1984.3, 1984.4, 1985.2, 1985.4, 1986.1, 
1986.4, 1987.2, 1987.4, 1988.2, 1988.4, 1989.2, 1989.4, 1990.1, 
1990.2, 1990.4, 1991.2, 1991.4, 1992.2, 1992.4, 1993.2, 1993.3, 
1993.4, 1994.1, 1994.2, 1994.3, 1995.1, 1995.2, 1995.3, 1995.4, 
1996.1, 1996.2, 1996.4, 1997.1, 1997.2, 1997.4, 1998.2, 1998.4, 
1999.1, 1999.2, 1999.4, 2000.1, 2000.2, 2000.3, 2000.4, 2001.1, 
2001.2, 2001.3, 2001.4, 2002.1, 2002.3, 2002.4, 2003.1, 2003.3, 
2003.4, 2004.1, 2004.2, 2004.4, 2005.1), GDPGrowth = c(1.2, -0.2, 
-0.1, 0.4, 0.2, 1.4, 1.3, 2.7, 0.8, 1, 0.6, -0.6, 1, 2.2, 1.6, 
1.8, 1.2, 0.4, 1.2, 2.1, 0.5, 1.8, -0.5, 1.3, -0.7, -0.3, 0, 
0.6, -0.1, 1.4, 0.3, 0.7, -0.3, 0.8, 0.2, 0, -0.8, 1.4, 0.7, 
-0.5, 1.3, 0.7, -0.4, -0.4, 1.2, 0, 0.8, 1.4, 0.8, 0, -0.3, 2, 
0, -0.2, -0.2, -0.5, 0.5, -0.2, -1.5, 0.8, 0.3, -0.2, 0.5, -0.1, 
0), AverageCONS = c(0, 1.17870603993396, 0, 0, 0, 0, 3.61936244127144, 
0.416666666666667, 0, 4.32707915240231, 7.6364088578926, 0.257076257076257, 
0, 22.4207759459411, 8.04871523114194, 14.6555459609091, 16.2250782932878, 
1.17084307021898, 6.78706870557528, 0, 0, 9.7539118534683, 0, 
0, 0, 0.0402095172505245, 0, 0, 0.0636265006342972, 0, 0.171974252305606, 
0, 11.1524740312643, 2.68040672020172, 6.2111801242236, 3.24760735460988, 
28.2976799963101, 0, 7.5732270977962, 0, 0, 1.49412278319752, 
70.2064896755162, 34.0042105697558, 18.5823772614653, 18.0896275972026, 
8.41449577357745, 10, 0, 0, 34.7491138493683, 8.36236933797909, 
39.6563615833003, 74.4262295081967, 22.3611248302746, 10, 22.911760840126, 
0, 0.0666722800439236, 0, 50.3843726943174, 0, 0, 1.72909969128655, 
0), BRGOVMEHR = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), ApprovalGOV = c(7.88127469343306, 7.67671239967451, 
7.33312916270801, 7.26681550104175, 7.57774047678561, 7.44440610328638, 
7.51950710910292, 7.43455364650782, 7.21735906430465, 6.93572597869877, 
7.33281216290814, 7.34110311728404, 7.81953413874152, 7.78638539397364, 
7.5580444419889, 6.70433786365357, 6.76698016084029, 6.9932335657093, 
7.89148017350285, 8.24705859843768, 8.20161269191644, 6.13021112846596, 
7.44587363057623, 7.48928443049876, 6.66476678973035, 6.48625864551226, 
6.13914369458924, 6.05634679973895, 5.9702909369734, 6.19216550005443, 
6.87967122943963, 7.49940214266322, 7.08991191311255, 7.3351806925688, 
7.46762039999888, 7.22336518577119, 6.75192112299076, 6.61614229895973, 
6.39396085192013, 6.09682321355397, 5.99711627005931, 5.95861508444216, 
6.31451929735713, 6.07326509257996, 7.58677238161551, 7.24041796080827, 
6.69547440053917, 7.72437292977251, 7.61985191697131, 7.85861327446016, 
7.78974162557168, 8.00182694049075, 7.82019060836613, 7.58946475073855, 
7.89751118735182, 7.14978411180804, 7.4578134112501, 6.24455242517448, 
5.8823776113788, 6.08103241246385, 5.73879250743035, 5.68128370028589, 
5.83312282222293, 6.49992542902688, 6.45920800878159)), row.names = c(NA, 
-65L), class = c("tbl_df", "tbl", "data.frame"))
  1. The reason you can't use $ is that the z object shown in the question is not a zoo object. It is a ts object. You can use class(z) , str(z) and dput(z) to determine what you have. Also, $ works on zoo objects but not on ts objects. Convert it to zoo and then $ will work.

     library(zoo) zz <- zoo(z, as.yearqtr(time(z))) zz$GDPGrowth ## 2482 Q1 2482 Q2 2482 Q3 2482 Q4 2483 Q1 2483 Q2 ## 1.20000000 -0.20000000 -0.15000000 -0.10000000 0.40000000 0.33333333 ## 2483 Q3 2483 Q4 2484 Q1 2484 Q2 2484 Q3 2484 Q4 ## 0.26666667 0.20000000 0.50000000 0.80000000 1.10000000 1.40000000 ## # ... snip ...

    The times in your object are way into the future but unless we know how you created them we cannot know how that happened. You possibly were playing with Date objects and made some error in converting them to ts .

  2. You have quarterly data and the 0, 0.25, 0.5 and 0.75 are how ts objects represent the 4 quarters internally. If this refers to the not wanting to apply na.approx to certain columns then if ix is a vector of column names or numbers to convert then zz[, ix] <- na.approx(zz[, ix]) applies na.approx only to those columns.

  3. ts and zoo represent the index via tsp and index attributes respectively so they are still there. time(z) and time(zz) will retrieve the index.

  4. If you want to do statistical tests, compute confidence intervals, etc. then you need to take the correlations into account; however, if you just want to get point estimates you don't need to concern yourself with that. The dyn package (also the dynlm package) can be used to facilitate running lm with zoo objects.

     library(dyn) fm <- dyn$lm(GDPGrowth ~ ApprovalGOV, zz) fm ## Call: ## lm(formula = dyn(GDPGrowth ~ ApprovalGOV), data = zz) ## ## Coefficients: ## (Intercept) ApprovalGOV ## -1.9717 0.3575

    Either of these also work and make use of with.zoo and fortify.zoo .

     with(zz, lm(GDPGrowth ~ ApprovalGOV)) lm(GDPGrowth ~ ApprovalGOV, fortify.zoo(zz))

    To plot the points and draw in a regression line:

     plot(formula(fm), zz) abline(fm)

Other points are:

  • R is case sensitive so GDPGrowth is not the same as GDPGROWTH .

  • do not use random code snippets that you have found on the net without first reading the help files for each function used so that you know whether it makes sense for your problem. Also read all the vignettes (pdf or html documents) for each package that you are using. In particular, the zoo package has 5 vignettes and a reference manual.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM