简体   繁体   English

是否有更好的方法在R中创建分位数“假人”/因子?

[英]Is there a better way to create quantile “dummies” / factors in R?

i´d like to assign factors representing quantiles. 我想分配代表分位数的因子。 Thus I need them to be numeric. 因此我需要它们是数字。 That´s why I wrote the following function, which is basically the answer to my problem: 这就是我编写以下函数的原因,这基本上是我的问题的答案:

qdum <- function(v,q){

qd = quantile(v,1:(q)/q)
v = as.data.frame(v)
v$b = 0
names(v) <- c("a","b")
i=1
for (i in 1:q){

    if(i == 1)
        v$b[ v$a < qd[1]] = 1
    else
        v$b[v$a > qd[i-1] & v$a <= qd[i]] = i
}

all = list(qd,v)
return(all)

    }

you may laugh now :) . 你现在可能会笑:) The returned list contains a variable that can be used to assign every observation to its corresponding quantile. 返回的列表包含一个变量,可用于将每个观察值分配给其对应的分位数。 My question is now: is there a better way (more "native" or "core") to do it? 我现在的问题是:有更好的方法(更“本土”或“核心”)吗? I know about quantcut (from the gtools package), but at least with the parameters I got, I ended up with only with those unhandy(? - at least to me) thresholds. 我知道quantcut(来自gtools包),但至少我得到的参数,我最终只有那些不方便(? - 至少对我来说)的阈值。

Any feedback thats helps to get better is appreciated! 任何有助于变得更好的反馈表示赞赏!

With base R, use quantiles to figure out the splits and then cut to convert the numeric variable to discrete: 使用基数R,使用分位数来计算分割,然后剪切以将数字变量转换为离散:

qcut <- function(x, n) {
  cut(x, quantile(x, seq(0, 1, length = n + 1)), labels = seq_len(n),
    include.lowest = TRUE)
}

or if you just want the number: 或者如果你只想要这个号码:

qcut2 <- function(x, n) {
  findInterval(x, quantile(x, seq(0, 1, length = n + 1)), all.inside = T)
}

I'm not sure what quantcut is but I would do the following 我不确定是什么量子,但我会做以下事情

qdum <- function(v, q) {
 library(Hmisc)
 quantilenum <- cut2(v, g=q)
 levels(quantilenum) <- 1:q
 cbind(v, quantilenum)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM