简体   繁体   中英

Converting a vector with continuous and categorical values into a vector of factors

Let's say I have a vector that contains values between 0 and 12 (continuous) representing age, and also 2 token values 97 and 99 that indicate "did not answer" and "missing" respectively, so something like:

v <- sample(c(sample(1:12, 95, replace = TRUE), 99, 99, 97, 99, 99))

I want to convert this numeric vector to a vector of factors, where I discretize the continuous values between 1 and 12 into three equal-interval bins (ie [1,4), [4, 8), [8,12] ), so that in the end I have a vector of factors with 5 levels, three for the bins, and 2 for 97 and 99 . I am trying to find the best/most efficient way to do this very generally in R.

Update

To put it in more concrete terms, I want a function numeric2factor that accepts a vector of values vec , a vector of tokens tokens , the range of the continuous values specified by start and end , and a discretization function discrFunc . numeric2factor converts vec = v from the example above into a vector of factors.

Assume that end is less than the lowest token values (for example, end=12 and the lowest token value is something like 97), so there is no overlap between continuous and categorical values.

discrFunc does something like return the cut-points (according to some discretization method) of just the continuous values from vec .

Here is the start, might need to adapt to your specific needs:

set.seed(1);v <- sample(c(sample(1:12, 95, replace = TRUE), 99, 99, 97, 99, 99))
table(v)
# 1  2  3  4  5  6  7  8  9 10 11 12 97 99 
# 5  6  9  7 13 10  4  8  8 11 10  4  1  4 

numeric2factor <- function(x, start, end, bins){
  res <- character(length = length(x))
  ix1 <- x >= start & x <= end
  res[ ix1 ] <- as.character(cut(x[ ix1 ], seq(min(x[ ix1 ]) - 1, max(x[ ix1 ]),
                                               length.out = bins + 1)))
  res[ !ix1 ] <- x[ !ix1 ]
  as.factor(res)

}

table(numeric2factor(v, min(v), 12, 3))
# (0,4]  (4,8] (8,12]     97     99 
#    27     35     33      1      4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM