简体   繁体   中英

R: Volatility function that interprets NAs

I am looking for help with getting a volatility function to work with my dataframe. In the function below, I'm just trying to get price daily log returns for each security (each column in my data is a different security's prices over time), and then calculate an annualized vol.

volcalc= function (x) {
  returns=log(x)-log(lag(x))
  vol=sd(returns)*sqrt(252)
  return(vol)
}

Then I run it with the function below, but it returns a 1*ncol numeric vector of only NAs.

testlag=apply(dataexample,2,volcalc)

My dataframe has NAs galore (it includes all assets over the entire time period, even if they weren't around at the time), and one clear problem is that my function is ignoring the NAs. But when I tried to add various na.rm=TRUE to the function, it did not work at all.

Below is an example dataset, where the columns x and y are different securities, with each row representing a day.

structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
NA, NA), y = c(3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA, NA, NA, NA
)), .Names = c("x", "y"), row.names = c(NA, 12L), class = "data.frame")

My question is: How do I either incorporate the NAs in the function or get around this problem in a different way by rewriting the function? Thank you for your help!

An alternative is to keep your data and replace all NA's with the closest previous non-NA value by running the 'na.locf' function (last observation carried forward) from the zoo-package BEFORE you apply the 'volcalc' - function. You original function has to be changed in any case as using the 'lag'-function introduces at least one NA (with a lag of 1) as mentioned by Akrun.

df.noNA <- na.locf(df) # df: original df with NAs
apply(df.noNA, 2, volcalc) # using Akrun’s corrected volcalc function
#       x        y 
#3.155899 1.592084 

Which option you prefer depends very much on the proportion of NAs in your data and what you consider the 'true' volatility as the values returned will be different.

We can remove the 'NA' elements with !is.na(x) , but the lag(x) will return NA as the first element, which can be removed by using na.rm=TRUE in the sd

  volcalc= function (x) {
    x <- x[!is.na(x)]
   returns=log(x)-log(lag(x))
   vol=sd(returns, na.rm=TRUE)*sqrt(252)
   return(vol)
 }

apply(dataexample, 2, volcalc)
#    x        y  
#3.012588 1.030484 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM