使用 Hmisc cut2 arguments - max 參數如何工作？

Question

我的龐大數據集中的長度不均勻。 即，2016 年的 700 次觀察，2017 年的 400 次觀察。我有很多年的數據，因此手動裁剪數據集是不可行的。

我想將它們都切成分位數以進行觀察，但每組只有前 400 個。=

Hmisc 文檔中有一個誘人的“minmax”參數。 是否可以使用 minmax 作為參數，以便 Hmisc 僅從觀測值 1-400 中刪除分位數？

Answer 1

使用dplyr ，您可以使用group_by和slice為 year 的每個值 select 前 400 條記錄。 然后創建分位數，無論是在每年內還是整體上。

set.seed(911) # Simulate some uneven data
df <- data.frame(year=rep(2016:2018, times=c(400,500,600)),
                 val=rnorm(1500,50,5))

library(dplyr); library(tidyr)

這會在每年內創建分位數

df %>% group_by(year) %>%
  slice(1:400) %>%
  mutate(q4 = cut(val, 
                  breaks=quantile(val, 
                                  probs = seq(0,1,1/4)), 
                  include=TRUE, labels=FALSE)) %>%
# You can stop here and save the output, here I continue to check the counts
  count(q4) %>%
  pivot_wider(names_from=q4, values_from=n)
# A tibble: 3 x 5
# Groups:   year [3]
#   year   `1`   `2`   `3`   `4`
#  <int> <int> <int> <int> <int>
#1  2016   100   100   100   100
#2  2017   100   100   100   100
#3  2018   100   100   100   100

或者您可以取消分組以創建整體分位數（每年的計數會有所不同）。

df %>% group_by(year) %>%
  slice(1:400) %>%
  ungroup() %>%
  mutate(q4 = cut(val, 
                  breaks=quantile(val, 
                                  probs = seq(0,1,1/4)), 
                  include=TRUE, labels=FALSE)) %>% 
# Stop here to save, or continue to check the counts
  group_by(year) %>%
  count(q4) %>%
  pivot_wider(names_from=q4, values_from=n)

# A tibble: 3 x 5
# Groups:   year [3]
#   year   `1`   `2`   `3`   `4`
#  <int> <int> <int> <int> <int>
#1  2016   116    88   102    94
#2  2017    86   114    85   115
#3  2018    98    98   113    91

使用 Hmisc cut2 arguments - max 參數如何工作？

問題描述

1 個解決方案

解決方案1
0 2020-05-22 01:23:56

使用 Hmisc cut2 arguments - max 參數如何工作？

問題描述

1 個解決方案

解決方案1 0 2020-05-22 01:23:56

解決方案1
0 2020-05-22 01:23:56