I am trying to find the right expression for creating a vector result by applying an operation over an vector, using, in a vectorised way, elements of a 2nd vector. The use case is that I have a vector of raw values, and a vector of breakpoints. What I want is an expression that will give me the result of applying a sum of a logical operation on the breakpoints with respect to the values in the values vector. In other words:
Given:
rawfoo <- c(30, 4, 22, 77, 1,169, 10)
breaksfoo <- c(10,50, 80)
resultfoo <- data.frame(breaks=breaksfoo, matching=numeric(length(breaksfoo)))
I want to write a single expression that delivers the column values for resultfoo$matching, which is: for each value in breaksfoo, sum(rawfoo > breaksfoo[i]),
resultfoo
breaks nmatching
1 10 3
2 50 2
3 80 1
I have been trying various forms of apply and having problems with how to express the function. Perhaps I am barking up the wrong tree? Can supply multiple demonstration of failure if required. (But my guess is that this question is so simple it doesn't need error messages to disambiguate it ;-)
You can do it in three steps:
Write a function that, given a break, returns a list of two element: the break itself and the result of sum(break > rawfoo)
.
Than you can use sapply
to apply this function to breaksfoo
.
Finally, you would need to transform the result of sapply
, which is a matrix, to get a dataframe you need.
The following code does all of these three steps in one statement:
as.data.frame(t(sapply(breaksfoo,
function(x) list(breaks = x, nmatching = sum(x > rawfoo)))))
returns
breaks nmatching
1 10 2
2 50 5
3 80 6
Combining findInterval
with table
might get you what you're looking for.
#finds which interval rawfoo is in
x <- findInterval(rawfoo,breaksfoo)
#[1] 1 0 1 2 0 3 1
#tabulates the information
table(x)
#0 1 2 3
#2 3 1 1
#cuts off the last element
head(table(x),-1)
#0 1 2
#2 3 1
resultfoo$nmatching <- head(table(x),-1)
This is almost what you want, except that 10
is being placed in the second bucket because findInterval
's intervals are inclusive on the lower end, while your example puts it in the first bucket because you want a strict inequality. You can add a corrective vector that will reassign to the right bucket:
y <- table(rawfoo)[as.character(breaksfoo)]
y[is.na(y)] <- 0
y <- y - c(0,head(y,-1))
resultfoo$nmatching <- resultfoo$nmatching + y
To make this easier to do, you can wrap it into a function.
fnfoo <- function(raw,breaks) {
x <- head(table(findInterval(rawfoo,breaksfoo)),-1)
y <- table(rawfoo)[as.character(breaksfoo)]
y[is.na(y)] <- 0
x + y - c(0,head(y,-1))
}
resultfoo$nmatching <- fnfoo(rawfoo,breaksfoo)
EDIT: I was browsing another question and realized that cut
works better here.
data.frame(table(cut(rawfoo,c(-Inf,breaksfoo),right=TRUE)))
# Var1 Freq
# 1 (-Inf,10] 3
# 2 (10,50] 2
# 3 (50,80] 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.