简体   繁体   中英

How to generalize this algorithm (sign pattern match counter)?

I have this code in R :

corr = function(x, y) {
    sx = sign(x)
    sy = sign(y)

    cond_a = sx == sy && sx > 0 && sy >0
    cond_b = sx < sy && sx < 0 && sy >0
    cond_c = sx > sy && sx > 0 && sy <0
    cond_d = sx == sy && sx < 0 && sy < 0
    cond_e = sx == 0 || sy == 0

    if(cond_a) return('a')
    else if(cond_b) return('b')
    else if(cond_c) return('c')
    else if(cond_d) return('d')
    else if(cond_e) return('e')
}

Its role is to be used in conjunction with the mapply function in R in order to count all the possible sign patterns present in a time series. In this case the pattern has a length of 2 and all the possible tuples are : (+,+)(+,-)(-,+)(-,-)

I use the corr function this way :

> with(dt['AAPL'], table(mapply(corr, Return[-1], Return[-length(Return)])) /length(Return)*100)

         a          b          c          d          e 
24.6129416 25.4466058 25.4863041 24.0174672  0.3969829 

> dt["AAPL",list(date, Return)]
      symbol       date     Return
   1:   AAPL 2014-08-29 -0.3499903
   2:   AAPL 2014-08-28  0.6496702
   3:   AAPL 2014-08-27  1.0987923
   4:   AAPL 2014-08-26 -0.5235654
   5:   AAPL 2014-08-25 -0.2456037

I would like to generalize the corr function to n arguments. This mean that for every n I would have to write down all the conditions corresponding to all the possible n-tuples. Currently the best thing I can think of for doing that is to make a python script to write the code string using loops, but there must be a way to do this properly. Do you have an idea about how I could generalize the fastidious condition writing, maybe I could try to use expand.grid but how do the matching then ?

This is a little different approach but it may give you what you're looking for and allows you to use any size of n-tuple. The basic approach is to find the signs of the adjacent changes for each sequential set of n returns, convert the n-length sign changes into n-tuples of 1's and 0's where 0 = negative return and 1 = positive return. Then calculate the decimal value of each n-tuple taken as binary number. These numbers will clearly be different for each distinct n-tuple. Using a zoo time series for these calculations provides several useful functions including get.hist.quote() to retrieve stock prices, diff() to calculate returns, and the rollapply() function to use in calculating the n-tuples and their sums.The code below does these calculations, converts the sum of the sign changes back to n-tuples of binary digits and collects the results in a data frame.

library(zoo)
library(tseries)
n <- 3     # set size of n-tuple
#
#  get stock prices and compute % returns
#
dtz <- get.hist.quote("AAPL","2014-01-01","2014-10-01", quote="Close")
dtz <-  merge(dtz, (diff(dtz, arithmetic=FALSE ) - 1)*100)
names(dtz)  <-  c("prices","returns")
#
#  calculate the sum of the sign changes
#
dtz <- merge(dtz, rollapply( data=(sign(dtz$returns)+1)/2, width=n, 
                       FUN=function(x, y) sum(x*y), y = 2^(0:(n-1)), align="right" ))
dtz <- fortify.zoo(dtz)
names(dtz)  <-  c("date","prices","returns", "sum_sgn_chg")
#
#  convert the sum of the sign changes back to an n-tuple of binary digits
#
for( i in 1:nrow(dtz) ) 
    dtz$sign_chg[i] <- paste(((as.numeric(dtz$sum_sgn_chg[i]) %/%(2^(0:2))) %%2), collapse="")
#
#  report first part of result
#
head(dtz, 10)
#
#  report count of changes by month and type
#
table(format(dtz$date,"%Y %m"), dtz$sign_chg)

An example of possible output is a table showing the count of changes by type for each month.

         000 001 010 011 100 101 110 111 NANANA
 2014 01   1   3   3   2   3   2   2   2      3
 2014 02   1   2   4   2   2   3   2   3      0
 2014 03   2   3   0   4   4   1   4   3      0
 2014 04   2   3   2   3   3   2   3   3      0
 2014 05   2   2   1   3   1   2   3   7      0
 2014 06   3   4   3   2   4   1   1   3      0
 2014 07   2   1   2   4   2   5   5   1      0
 2014 08   2   2   1   3   1   2   2   8      0
 2014 09   0   4   2   3   4   2   4   2      0
 2014 10   0   0   1   0   0   0   0   0      0

so this would show that in month 1, January of 2014, there was one set of three days with 000 indicating 3 down returns , 3 days with the 001 change indicating two down return and followed by one positive return and so forth. Most months seem to have a fairly random distribution but May and August show 7 and 8 sets of 3 days of positive returns reflecting the fact that these were strong months for AAPL.

I think you're better off using rollapply(...) in the zoo package for this. Since you seem to be using quantmod anyway (which loads xts and zoo ), here is a solution that does not use all those nested if(...) statements.

library(quantmod)
AAPL    <- getSymbols("AAPL",auto.assign=FALSE)
AAPL    <- AAPL["2007-08::2009-03"]    # AAPL during the crash...
Returns <- dailyReturn(AAPL)

get.patterns <- function(ret,n) {
  f <- function(x) {  # identifies which row of `patterns` matches sign(x)
    which(apply(patterns,1,function(row)all(row==sign(x))))
  }
  returns  <- na.omit(ret)
  patterns <- expand.grid(rep(list(c(-1,1)),n))
  labels   <- apply(patterns,1,function(row) paste0("(",paste(row,collapse=","),")"))
  result   <- rollapply(returns,width=n,f,align="left")
  data.frame(100*table(labels[result])/(length(returns)-(n-1)))
}
get.patterns(Returns,n=2)
#      Var1     Freq
# 1 (-1,-1) 22.67303
# 2  (-1,1) 26.49165
# 3  (1,-1) 26.73031
# 4   (1,1) 23.15036

get.patterns(Returns,n=3)
#         Var1      Freq
# 1 (-1,-1,-1)  9.090909
# 2  (-1,-1,1) 13.397129
# 3  (-1,1,-1) 14.593301
# 4   (-1,1,1) 11.722488
# 5  (1,-1,-1) 13.636364
# 6   (1,-1,1) 13.157895
# 7   (1,1,-1) 12.200957
# 8    (1,1,1) 10.765550

The basic idea is to create a patterns matrix with 2^n rows and n columns, where each row represents one of the possible patterns (e,g, (1,1), (-1,1), etc.). Then pass the daily returns to this function n-wise using rollapply(...) and identify which row in patterns matches sign(x) exactly. Then use this vector of row numbers an an index into labels , which contains a character representation of the patterns, then use table(...) as you did.

This is general for an n-day pattern, but it ignores situations where any return is exactly zero, so the $Freq columns do not add up to 100. As you can see, this doesn't happen very often.

It's interesting that even during the crash it was (very slightly) more likely to have two up days in succession, than two down days. If you look at plot(Cl(AAPL)) during this period, you can see that it was a pretty wild ride.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM