[英]Difference between ntile and cut and then quantile() function in R
[英]R: Interchanging "Quantile" and "Ntile" Functions?
我正在使用 R 編程語言。
我有以下數據集:
set.seed(123)
library(dplyr)
var1 = rnorm(10000, 100,100)
var2 = rnorm(10000, 100,100)
var3 = rnorm(10000, 100,100)
var4 = rnorm(10000, 100,100)
var5 <- factor(sample(c("A","B", "C", "D", "E"), 1000, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2)))
my_data = data.frame( var1, var2, var3, var4, var5)
我能夠運行以下代碼(涉及“ntiles”):
test = my_data %>%
group_by(var5) %>%
mutate(group = ntile(var1, 4)) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
我現在嘗試用“分位數”function 替換“分位數”function:
test = my_data %>%
group_by(var5) %>%
mutate(group = quantile(var1, c(0, 0.25, 0.5, 0.75, 1))) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
但我收到以下錯誤:
Error in `mutate()`:
! Problem while computing `group = quantile(var1, c(0, 0.25, 0.5, 0.75, 1))`.
x `group` must be size 2170 or 1, not 5.
i The error occurred in group 1: var5 = A.
Run `rlang::last_error()` to see where the error occurred.
有人可以告訴我如何解決這個問題嗎?
謝謝!
quantile
返回與概率長度相同的長度
> quantile(rnorm(25), probs = c(0, 0.25, 0.5, 0.75, 1))
0% 25% 50% 75% 100%
-2.2104715 -1.3785488 -0.3379010 0.5721671 2.0572593
而mutate
要求列的長度與列的原始長度相同。 我們可能需要在這里cut
test2 <- my_data %>%
group_by(var5) %>%
mutate(group = cut(var1, breaks = c(-Inf,
quantile(var1, c(0, 0.25, 0.5, 0.75, 1))))) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
-輸出
> test2
# A tibble: 10,000 × 9
var1 var2 var3 var4 var5 group min max range
<dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <dbl> <chr>
1 44.0 337. 16.4 80.6 E (35.5,99.5] 35.5 99.4 35.5075826967309-99.4142501887206
2 77.0 83.3 77.9 126. E (35.5,99.5] 35.5 99.4 35.5075826967309-99.4142501887206
3 256. 193. -110. 46.2 E (168,472] 168. 472. 168.347362097985-471.572072587951
4 107. 43.2 -66.8 -17.9 A (96.7,166] 96.7 166. 96.7121945194114-165.961545193295
5 113. 123. -9.80 190. C (99.2,166] 99.2 166. 99.2497216290279-166.111860991813
6 272. 213. -66.6 98.4 E (168,472] 168. 472. 168.347362097985-471.572072587951
7 146. 238. 95.0 118. D (102,170] 102. 170. 102.486419143665-170.378758447782
8 -26.5 76.7 256. 160. D (-229,33.4] -219. 33.3 -218.918610643503-33.3345619877818
9 31.3 -60.1 59.5 126. A (-285,36.6] -247. 36.5 -246.749014053051-36.5196181691353
10 55.4 70.2 179. 130. B (30.5,97] 30.5 97.0 30.5063174765106-96.9569718037872
# … with 9,990 more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.