Let's say I have a vector that contains values between 0 and 12 (continuous) representing age, and also 2 token values 97 and 99 that indicate "did not answer" and "missing" respectively, so something like:
v <- sample(c(sample(1:12, 95, replace = TRUE), 99, 99, 97, 99, 99))
I want to convert this numeric vector to a vector of factors, where I discretize the continuous values between 1 and 12 into three equal-interval bins (ie [1,4), [4, 8), [8,12]
), so that in the end I have a vector of factors with 5 levels, three for the bins, and 2 for 97
and 99
. I am trying to find the best/most efficient way to do this very generally in R.
Update
To put it in more concrete terms, I want a function numeric2factor
that accepts a vector of values vec
, a vector of tokens tokens
, the range of the continuous values specified by start
and end
, and a discretization function discrFunc
. numeric2factor
converts vec = v
from the example above into a vector of factors.
Assume that end
is less than the lowest token values (for example, end=12
and the lowest token value is something like 97), so there is no overlap between continuous and categorical values.
discrFunc
does something like return the cut-points (according to some discretization method) of just the continuous values from vec
.
Here is the start, might need to adapt to your specific needs:
set.seed(1);v <- sample(c(sample(1:12, 95, replace = TRUE), 99, 99, 97, 99, 99))
table(v)
# 1 2 3 4 5 6 7 8 9 10 11 12 97 99
# 5 6 9 7 13 10 4 8 8 11 10 4 1 4
numeric2factor <- function(x, start, end, bins){
res <- character(length = length(x))
ix1 <- x >= start & x <= end
res[ ix1 ] <- as.character(cut(x[ ix1 ], seq(min(x[ ix1 ]) - 1, max(x[ ix1 ]),
length.out = bins + 1)))
res[ !ix1 ] <- x[ !ix1 ]
as.factor(res)
}
table(numeric2factor(v, min(v), 12, 3))
# (0,4] (4,8] (8,12] 97 99
# 27 35 33 1 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.