简体   繁体   English

修改Hmisc程序包中cut2函数的中断

[英]Modify Breaks in cut2 function in Hmisc package

This is a follow-up to this question: 这是此问题的后续措施:

Dataframe Aggregation By Group - Separating a Column's Values by Ranges 按组进行数据框聚合-按范围分隔列的值

The answer provided uses Hmisc::cut2 which works great! 提供的答案使用Hmisc::cut2 ,效果很好! I want to modify the breaks so that instead of breaking by $1 it breaks by $0.50. 我想修改休息时间,以使休息时间减少$ 1,而不是减少$ 1。

Below is the code provided for the answer: 以下是为答案提供的代码:

library(Hmisc)
library(dplyr)

df$cut_Price <- cut2(df$Price, cuts = 4:13)

df %>% group_by(cut_Price, Size, Type) %>%
    summarise_at(c("Opps", "NumberofSales", "Revenue"),"sum") %>%
    arrange(Size, cut_Price) %>% ungroup() %>%
    mutate(cut_Price = gsub("(.*, \\d\\.)00", "\\199", cut_Price))

 # A tibble: 16 × 6
       cut_Price   Size    Type    Opps NumberofSales  Revenue
           <chr> <fctr>  <fctr>   <dbl>         <dbl>    <dbl>
1  [ 5.00, 6.99)  LARGE desktop  477870        342455  2037.67
2  [ 6.00, 7.99)  LARGE desktop  842882        523309  3292.29
3  [ 7.00, 8.99)  LARGE desktop  283107        149878  1189.56
4  [10.00,11.00)  LARGE desktop 5506835       1179544 12674.17
5  [11.00,12.00)  LARGE desktop 3542187       1521347 17342.81
6  [ 3.63, 4.99) MEDIUM desktop 6038044       5129937 18617.94
7  [ 5.00, 6.99) MEDIUM desktop 2558997        478423  2548.95
8  [ 7.00, 8.99) MEDIUM desktop 1071631        352294  2483.10
9  [ 9.00,10.00) MEDIUM desktop 2510873        861183  8428.70
10 [10.00,11.00) MEDIUM desktop  441354        215643  2322.70
11 [11.00,12.00) MEDIUM desktop 5144351       1954720 22138.16
12 [ 3.63, 4.99)  SMALL desktop  801038        587541  2145.76
13 [ 4.00, 5.99)  SMALL desktop  939806        303515  1214.60
14 [ 5.00, 6.99)  SMALL desktop 8303927       2143565 11902.14
15 [10.00,11.00)  SMALL desktop  920975        321515  3284.54
16 [11.00,12.00)  SMALL desktop  181471        236643  2811.50

Any help would be great, thanks! 任何帮助将是巨大的,谢谢!

You need to pass cut2 the vector of breaks you want, which you can create with seq : 您需要传递cut2您想要的中断向量,可以使用seq创建:

library(tidyverse)

df %>% group_by(Size, 
                cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5)), 
                Type) %>% 
    summarise_at(c("Opps", "NumberofSales", "Revenue"), sum)

## Source: local data frame [18 x 6]
## Groups: Size, cut_Price [?]
## 
##      Size     cut_Price    Type    Opps NumberofSales  Revenue
##    <fctr>        <fctr>  <fctr>   <dbl>         <dbl>    <dbl>
## 1   LARGE [ 5.50, 6.00) desktop  477870        342455  2037.67
## 2   LARGE [ 6.00, 6.50) desktop  842882        523309  3292.29
## 3   LARGE [ 7.50, 8.00) desktop  283107        149878  1189.56
## 4   LARGE [10.00,10.50) desktop  928563        209218  2138.41
## 5   LARGE [10.50,11.00) desktop 4578272        970326 10535.76
## 6   LARGE [11.00,11.50) desktop 3542187       1521347 17342.81
## 7  MEDIUM [ 3.63, 4.00) desktop 6038044       5129937 18617.94
## 8  MEDIUM [ 5.00, 5.50) desktop 2558997        478423  2548.95
## 9  MEDIUM [ 7.00, 7.50) desktop 1071631        352294  2483.10
## 10 MEDIUM [ 9.50,10.00) desktop 2510873        861183  8428.70
## 11 MEDIUM [10.50,11.00) desktop  441354        215643  2322.70
## 12 MEDIUM [11.00,11.50) desktop 5144351       1954720 22138.16
## 13  SMALL [ 3.63, 4.00) desktop  801038        587541  2145.76
## 14  SMALL [ 4.00, 4.50) desktop  939806        303515  1214.60
## 15  SMALL [ 5.00, 5.50) desktop  849537        340580  1837.93
## 16  SMALL [ 5.50, 6.00) desktop 7454390       1802985 10064.21
## 17  SMALL [10.00,10.50) desktop  920975        321515  3284.54
## 18  SMALL [11.50,12.00) desktop  181471        236643  2811.50

If you want rows for every value, you can use tidyr::complete . 如果您希望每个值都包含行,则可以使用tidyr::complete Empty values will be NA unless you specify otherwise in complete 's fill parameter. 除非您在completefill参数中另外指定,否则空值将为NA

df %>% group_by(Size, 
                cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5), oneval = FALSE), 
                Type) %>% 
    summarise_at(c("Opps", "NumberofSales", "Revenue"), sum) %>% 
    ungroup() %>% 
    complete(Size, cut_Price, Type)

## # A tibble: 57 × 6
##      Size     cut_Price    Type   Opps NumberofSales Revenue
##    <fctr>        <fctr>  <fctr>  <dbl>         <dbl>   <dbl>
## 1   LARGE [ 3.63, 4.00) desktop     NA            NA      NA
## 2   LARGE [ 4.00, 4.50) desktop     NA            NA      NA
## 3   LARGE [ 4.50, 5.00) desktop     NA            NA      NA
## 4   LARGE [ 5.00, 5.50) desktop     NA            NA      NA
## 5   LARGE [ 5.50, 6.00) desktop 477870        342455 2037.67
## 6   LARGE [ 6.00, 6.50) desktop 842882        523309 3292.29
## 7   LARGE [ 6.50, 7.00) desktop     NA            NA      NA
## 8   LARGE [ 7.00, 7.50) desktop     NA            NA      NA
## 9   LARGE [ 7.50, 8.00) desktop 283107        149878 1189.56
## 10  LARGE [ 8.00, 8.50) desktop     NA            NA      NA
## # ... with 47 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM