简体   繁体   中英

Understanding R convolution code with sapply()

I am trying to break apart the R code in this post :

x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)


convolve.binomial <- function(p) {
  # p is a vector of probabilities of Bernoulli distributions.
  # The convolution of these distributions is returned as a vector
  # `z` where z[i] is the probability of i-1, i=1, 2, ..., length(p)+1.
  n <- length(p) + 1
  z <- c(1, rep(0, n-1))
  sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q})
  z
}
convolve.binomial(x)
 [1] 5.826141e-05 1.068804e-03 8.233357e-03 3.565983e-02 9.775029e-02
 [6] 1.804516e-01 2.323855e-01 2.127628e-01 1.394564e-01 6.519699e-02
[11] 2.141555e-02 4.799630e-03 6.979119e-04 6.038947e-05 2.647052e-06
[16] 4.091095e-08

I tried debugging in RStudio, but it still opaque.

The issue is with the line: sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q}) .

I guess that within the context of the call convolve.binomial(x) p = q = x . At least I get identical results if I pull the lines outside the function and run sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x}) :

x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
n <- length(x) + 1
z <- c(1, rep(0, n-1))
#  [1] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
z # Is extracted by calling it and contains the correct result

My questions are:

  1. What is the purpose of the ;q} ending within sapply() ?
  2. How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply() ?

Below you can see my problem "hacking" this line of code:

(x_complem = 1 - x)
sapply(x, function(x) {z <<- x_complem * z + x * (c(0, z[-n])); x})
z # Returns 16 values and warnings

z_offset = c(0, z[-n])
#  [1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * z_offset; x})
z # Returns different values.

If you want to see the intermediate values of z as the function proceeds then insert either a cat or a print command in the code below:

sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); cat(z,"\n"); x})
#--------
0.83 0.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0.4482 0.4736 0.0782 0 0 0 0 0 0 0 0 0 0 0 0 0 
0.170316 0.457852 0.323348 0.048484 0 0 0 0 0 0 0 0 0 0 0 0 
0.1566907 0.4348491 0.3341083 0.07047312 0.00387872 0 0 0 0 0 0 0 0 0 0 0 
0.09401443 0.3235858 0.3744046 0.1759272 0.03051648 0.001551488 0 0 0 0 0 0 0 0 0 0 
0.02256346 0.1491116 0.3357823 0.3267701 0.1410286 0.02356488 0.001179131 0 0 0 0 0 0 0 0 0 
snipped rest of output

I think this makes it clearer that what is happening is that each intermediate step represents a set of probabilities for a series of events. Each row sums to 1.0 and represents the probabilities of individual count survivals when there might be a smaller number of binomial parameters. The final result displays the probabilities of particular sums of counts after the full sequence has been assembled.

Another interesting feature is that this result is invariant under random re-ordering of the probabilities in x (as it should be for the original question). Examine the intermediate results from

plot(x)
lines(seq(length(z)), z)
z2 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z2, col="red" )
z3 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z3, col="blue" )
  1. What is the purpose of the ;q} ending within sapply() ?

The function within sapply return q , but actually it's not needed. The following function will work just the same.

convolve.binomial <- function(p) {
  n <- length(p) + 1
  z <- c(1, rep(0, n-1))
  sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n]))})
  z
}
  1. How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply() ?

In R, if you search up the documentation for the <<- operator using ?'<<-' it says that

The operators <<- and >>- are normally only used in function, and cause a search to be made through parent environments for an existing definition of the variable to be assigned. If such as variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment.

In the function convolve.binomial the value z is defined local to the function. So z <<- actually redefines z in the convolve.binomial function.

So to summarize, the z <<- in the sapply call changes the z variable already defined in convolve.binomial and we eventually return this z . The ;q} ending is not needed within sapply() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM