简体   繁体   English

如何最好地计算总数中的这个份额

[英]how best to calculate this share of a total

Below is the sample data.以下是示例数据。 The goal is to first create a column that contains the total employment for that quarter.目标是首先创建一个列,其中包含该季度的总就业人数。 Second is to create a new column that shows the relative share for the area.其次是创建一个新列,显示该区域的相对份额。 Finally, the last item (and one which is vexing me) is to calculate whether the total with suppress = 0 represents over 50% of the total.最后,最后一项(也是令我烦恼的一项)是计算 suppress = 0 的总数是否超过总数的 50%。 I can do this in excel easily but trying to this in R and so have it be something that I can replicate year after year.我可以在 excel 中轻松做到这一点,但在 R 中尝试做到这一点,所以我可以年复一年地复制它。

desired result is below期望的结果如下

  area <- c("001","005","007","009","011","013","015","017","019","021","023","027","033","001","005","007","009","011","013","015","017","019","021","023","027","033")
 year <- c("2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021")
 qtr <- c("01","01","01","01","01","01","01","01","01","01","01","01","01","02","02","02","02","02","02","02","02","02","02","02","02","02")
  employment <- c(2,4,6,8,11,10,12,14,16,18,20,22,30,3,5,8,9,12,9,24,44,33,298,21,26,45)
  suppress <- c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0)

  testitem <- data.frame(year,qtr, area, employment, suppress)

For the first quarter of 2021, the total is 173. If you only take suppress = 1 into account, that is only 24 of 173 hence the TRUE in the 50 percent column.对于 2021 年第一季度,总数为 173。如果只考虑 suppress = 1,则只有 173 中的 24,因此在 50% 列中为 TRUE。 If these two values summed up to 173/2 or greater than you would have it say FALSE.如果这两个值的总和为 173/2 或大于您会说 FALSE。 For the second quarter, the suppress = 1 accounts for 310 of the total of 537 and so is over 50% of the total.第二季度,suppress = 1 占总数 537 个中的 310 个,占总数的 50% 以上。

For the total column, I am showing the computation or ingredients.对于总计列,我显示了计算或成分。 Ideally, it would show a value such as.0115 in place of 2/173.理想情况下,它会显示诸如 .0115 之类的值,而不是 2/173。

 year    qtr   area     employment   suppress     total       50percent
2021     01     001        2           0          =2/173       TRUE
2021     01     005        4           0          =4/173       TRUE
.....
2021     02     001        3           0          =3/537       FALSE
2021     02     005        5           0          =5/537       FALSE

For example:例如:

library(dplyr)

testitem %>% 
  group_by(year, qtr) %>% 
  mutate(
    total = employment / sum(employment),
    over_half = sum(employment[suppress == 0]) > (0.5 * sum(employment))
  )

Gives:给出:

 # A tibble: 26 × 7 # Groups: year, qtr [2] year qtr area employment suppress total over_half <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> 1 2021 01 001 2 0 0.0116 TRUE 2 2021 01 005 4 0 0.0231 TRUE 3 2021 01 007 6 0 0.0347 TRUE 4 2021 01 009 8 1 0.0462 TRUE 5 2021 01 011 11 0 0.0636 TRUE 6 2021 01 013 10 0 0.0578 TRUE 7 2021 01 015 12 0 0.0694 TRUE 8 2021 01 017 14 0 0.0809 TRUE 9 2021 01 019 16 1 0.0925 TRUE 10 2021 01 021 18 0 0.104 TRUE # … with 16 more rows # ℹ Use `print(n =...)` to see more rows

I think you'll want to use group_by() and mutate() here.我想你会想在这里使用group_by()mutate()

library(dplyr)

testitem |> 
  ## grouping by year and quarter
  ## sums will be calculated over areas
  group_by(year, qtr) |> 
  ## this could be more terse, but gets the job done.
  mutate(total_sum = sum(employment),
         ## This uses the total_sum column that was just created
         total_prop = employment/total_sum,
         ## leveraging the 0,1 coding of suppress
         suppress_sum = sum(suppress * employment),
         suppress_prop = suppress_sum/total,
         fifty = (1-suppress_prop) > 0.5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM