简体   繁体   English

如何根据某些字段在 R 中进行 cumsum?

[英]How to cumsum in R based on certain fields?

I apologize for the length but it is necessary in order to not skip over details and make this more confusing than it is already.我为篇幅道歉,但为了不跳过细节并使这比现在更加混乱,这是必要的。

Below is the sample data and some of the manipulations that I have done so far.以下是我迄今为止所做的示例数据和一些操作。

library(dplyr)
library(tidyverse)

emp <- c(1,2,3,4,5,6,7,8,1,12,54,101,33,159,201,261,110,195,131,228)
small <- c(1,1,1,1,1,1,1,1,1,1,1,2,1,3,3,4,2,3,2,3)
area <-c(003,003,003,003,003,003,003,003,003,003,003,003,003,003,003,003,003,003,003,003)
twodigit <-c(11,22,11,22,23,22,11,31,44,45,21,44,45,62,72,22,45,72,45,21)

smbtest2 <- data.frame(emp,small,area,twodigit)

So before I get too far the goal is to sum the employment (emp) by small (schema below) and then break it down by two digit (industry code).因此,在我走得太远之前,目标是将就业(emp)按小(下图)求和,然后将其分解为两位数(行业代码)。 In this simple example, I want the top 3 industries per every small category.在这个简单的例子中,我想要每个小类别的前 3 个行业。 I am trying to cumsum because if one is in the first category (0 to 99) then it will be in the second category (0 to 149).我正在尝试 cumsum,因为如果一个属于第一类(0 到 99),那么它将属于第二类(0 到 149)。

smbsummary3<-smbtest2 %>% 
group_by(area,small,twodigit) %>%
summarise(emp = sum(emp), worksites = n(), 
        .groups = 'drop_last')%>%
slice_max(emp,n=3)

smbsummary4<-smbsummary3 %>% 
ungroup %>% 
complete(area, small = unique(small)) %>% 
fill(emp, worksites)

 Schema for small
 1     0 to 99
 2     0 to 149
 3     0 to 249
 4     0 to 499

Desired Result期望的结果

   area     small   twodigit    emp    worksites
   003        1        21        54        1
   003        1        45        45        2       (12+33)
   003        1        22        12        3       (2+4+6)
   003        2        45       286        4       (12+33+110+131)
   003        2        44       102        2       (1+101)
   003        2        21        54        1

At present, it sums based purely on small which is what it should do based on the code.目前,它是纯粹基于小求和,这是它应该基于代码做的。 However, my question is how do I get it to cumsum (cumulative sum) based on the small category?但是,我的问题是如何根据小类别将其变为cumsum(累计和)?

Below is my latest attempt.下面是我最近的尝试。 It does not add up to the correct answer but I think it is close to being the correct set of commands.它不加起来就是正确的答案,但我认为它接近正确的命令集。

smbsummary3<-smbtest2 %>% 
group_by(area,small,twodigit) %>%
summarise(emp = sum(emp), worksites = n(), 
        .groups = 'drop_last')%>%
mutate(emp = cumsum(emp),
     worksites = cumsum(worksites))%>%
slice_max(emp,n=3)

I was going to try to explain in a comment, but this seemed easier.我打算在评论中解释一下,但这似乎更容易。

Maybe you want to group_by just area and twodigit before doing your cumulative sum.也许您想在进行累积group_by之前仅对areatwodigit进行分组。

Then, group_by again to select the top 3 emp values by area and small .然后, group_by再次到 select 的前 3 个emp值 by areasmall The resulting output looks very similar (could not find small 2 and twodigit 21 in dataset).生成的 output 看起来非常相似(在数据集中找不到small 2 和twodigit 21)。

smbtest2 %>%
  group_by(area, small, twodigit) %>%
  summarise(emp = sum(emp), 
            worksites = n(), 
            .groups = 'drop_last') %>%
  group_by(area, twodigit) %>%
  mutate(emp = cumsum(emp),
         worksites = cumsum(worksites)) %>%
  group_by(area, small) %>%
  slice_max(emp, n = 3) %>%
  arrange(area, small, desc(emp))

Output Output

   area small twodigit   emp worksites
  <dbl> <dbl>    <dbl> <dbl>     <int>
1     3     1       21    54         1
2     3     1       45    45         2
3     3     1       22    12         3
4     3     2       45   286         4
5     3     2       44   102         2
6     3     3       72   396         2
7     3     3       21   282         2
8     3     3       62   159         1
9     3     4       22   273         4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM