在 Hmisc 包中使用 cut2 来处理大 N

Question

It seems that for large N (say 2e6 ) the cut2 function in the Hmisc package throws an error似乎对于大N （比如2e6 ）， Hmisc包中的cut2函数会引发错误

 y = cut2(rnorm(2000000,0,1),m=sqrt(2000000))

 Error in if (cj == upper) next : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In (1:g) * nnm : NAs produced by integer overflow

I'm trying to obtain quantiles of my data, with m points in each quantile, and also record endpoints of each quantile.我正在尝试获取数据的分位数，每个分位数中有m个点，并记录每个分位数的端点。 cut2 does this, but not very well for large N . cut2这样做，但对于大N不是很好。 Are there better alternatives?有更好的选择吗？

Answer 1

Is this what you want?这是你想要的吗？

cut3 = function(x, m) {
    p = seq(0, 1, by = m / length(x))
    q = quantile(x, probs = p, names = F)
    result = cut(x, breaks = q)
}

Testing it out:测试一下：

x = rnorm(2e6)
m = sqrt(2e6)
qq = cut3(x, m)
summary(as.numeric(table(qq)))
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1414    1414    1414    1414    1414    1415 
head(qq)
# [1] (0.4757,0.4779] (-1.021,-1.018] (0.4325,0.4344] (1.376,1.381]   (-2.156,-2.138] (0.1215,0.1233]
# 1414 Levels: (-4.964,-3.196] (-3.196,-2.981] (-2.981,-2.86] (-2.86,-2.766] (-2.766,-2.696] (-2.696,-2.637] ... (3.145,3.607]

在 Hmisc 包中使用 cut2 来处理大 N

问题描述

1 个解决方案

解决方案1
1 2016-09-14 22:57:20

在 Hmisc 包中使用 cut2 来处理大 N

问题描述

1 个解决方案

解决方案1 1 2016-09-14 22:57:20

解决方案1
1 2016-09-14 22:57:20