[英]Using cut2 in Hmisc package for large N
似乎對於大N
(比如2e6
), Hmisc
包中的cut2
函數會引發錯誤
y = cut2(rnorm(2000000,0,1),m=sqrt(2000000))
Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow
我正在嘗試獲取數據的分位數,每個分位數中有m
個點,並記錄每個分位數的端點。 cut2
這樣做,但對於大N
不是很好。 有更好的選擇嗎?
這是你想要的嗎?
cut3 = function(x, m) {
p = seq(0, 1, by = m / length(x))
q = quantile(x, probs = p, names = F)
result = cut(x, breaks = q)
}
測試一下:
x = rnorm(2e6)
m = sqrt(2e6)
qq = cut3(x, m)
summary(as.numeric(table(qq)))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1414 1414 1414 1414 1414 1415
head(qq)
# [1] (0.4757,0.4779] (-1.021,-1.018] (0.4325,0.4344] (1.376,1.381] (-2.156,-2.138] (0.1215,0.1233]
# 1414 Levels: (-4.964,-3.196] (-3.196,-2.981] (-2.981,-2.86] (-2.86,-2.766] (-2.766,-2.696] (-2.696,-2.637] ... (3.145,3.607]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.