简体   繁体   中英

Syntax (and/or functions) for applying an op over elements of one vector, using as arg elements of a 2nd vector

I am trying to find the right expression for creating a vector result by applying an operation over an vector, using, in a vectorised way, elements of a 2nd vector. The use case is that I have a vector of raw values, and a vector of breakpoints. What I want is an expression that will give me the result of applying a sum of a logical operation on the breakpoints with respect to the values in the values vector. In other words:

Given:

rawfoo <- c(30, 4, 22, 77, 1,169, 10)
breaksfoo <- c(10,50, 80)
resultfoo <- data.frame(breaks=breaksfoo, matching=numeric(length(breaksfoo)))

I want to write a single expression that delivers the column values for resultfoo$matching, which is: for each value in breaksfoo, sum(rawfoo > breaksfoo[i]),

resultfoo
  breaks nmatching
1     10         3
2     50         2
3     80         1

I have been trying various forms of apply and having problems with how to express the function. Perhaps I am barking up the wrong tree? Can supply multiple demonstration of failure if required. (But my guess is that this question is so simple it doesn't need error messages to disambiguate it ;-)

You can do it in three steps:

  1. Write a function that, given a break, returns a list of two element: the break itself and the result of sum(break > rawfoo) .

  2. Than you can use sapply to apply this function to breaksfoo .

  3. Finally, you would need to transform the result of sapply , which is a matrix, to get a dataframe you need.

The following code does all of these three steps in one statement:

 as.data.frame(t(sapply(breaksfoo, 
                        function(x) list(breaks = x, nmatching = sum(x > rawfoo)))))

returns

  breaks nmatching
1     10         2
2     50         5
3     80         6

Combining findInterval with table might get you what you're looking for.

#finds which interval rawfoo is in
x <- findInterval(rawfoo,breaksfoo)
#[1] 1 0 1 2 0 3 1
#tabulates the information
table(x)
#0 1 2 3 
#2 3 1 1 
#cuts off the last element
head(table(x),-1)
#0 1 2 
#2 3 1 
resultfoo$nmatching <- head(table(x),-1)

This is almost what you want, except that 10 is being placed in the second bucket because findInterval 's intervals are inclusive on the lower end, while your example puts it in the first bucket because you want a strict inequality. You can add a corrective vector that will reassign to the right bucket:

y <- table(rawfoo)[as.character(breaksfoo)]
y[is.na(y)] <- 0
y <- y - c(0,head(y,-1))
resultfoo$nmatching <- resultfoo$nmatching + y

To make this easier to do, you can wrap it into a function.

fnfoo <- function(raw,breaks) {
  x <- head(table(findInterval(rawfoo,breaksfoo)),-1)
  y <- table(rawfoo)[as.character(breaksfoo)]
  y[is.na(y)] <- 0
  x + y - c(0,head(y,-1))
}
resultfoo$nmatching <- fnfoo(rawfoo,breaksfoo)

EDIT: I was browsing another question and realized that cut works better here.

data.frame(table(cut(rawfoo,c(-Inf,breaksfoo),right=TRUE)))
#        Var1 Freq
# 1 (-Inf,10]    3
# 2   (10,50]    2
# 3   (50,80]    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM