簡體   English   中英

如何最好地計算總數中的這個份額

[英]how best to calculate this share of a total

以下是示例數據。 目標是首先創建一個列,其中包含該季度的總就業人數。 其次是創建一個新列,顯示該區域的相對份額。 最后,最后一項(也是令我煩惱的一項)是計算 suppress = 0 的總數是否超過總數的 50%。 我可以在 excel 中輕松做到這一點,但在 R 中嘗試做到這一點,所以我可以年復一年地復制它。

期望的結果如下

  area <- c("001","005","007","009","011","013","015","017","019","021","023","027","033","001","005","007","009","011","013","015","017","019","021","023","027","033")
 year <- c("2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021")
 qtr <- c("01","01","01","01","01","01","01","01","01","01","01","01","01","02","02","02","02","02","02","02","02","02","02","02","02","02")
  employment <- c(2,4,6,8,11,10,12,14,16,18,20,22,30,3,5,8,9,12,9,24,44,33,298,21,26,45)
  suppress <- c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0)

  testitem <- data.frame(year,qtr, area, employment, suppress)

對於 2021 年第一季度,總數為 173。如果只考慮 suppress = 1,則只有 173 中的 24,因此在 50% 列中為 TRUE。 如果這兩個值的總和為 173/2 或大於您會說 FALSE。 第二季度,suppress = 1 占總數 537 個中的 310 個,占總數的 50% 以上。

對於總計列,我顯示了計算或成分。 理想情況下,它會顯示諸如 .0115 之類的值,而不是 2/173。

 year    qtr   area     employment   suppress     total       50percent
2021     01     001        2           0          =2/173       TRUE
2021     01     005        4           0          =4/173       TRUE
.....
2021     02     001        3           0          =3/537       FALSE
2021     02     005        5           0          =5/537       FALSE

例如:

library(dplyr)

testitem %>% 
  group_by(year, qtr) %>% 
  mutate(
    total = employment / sum(employment),
    over_half = sum(employment[suppress == 0]) > (0.5 * sum(employment))
  )

給出:

 # A tibble: 26 × 7 # Groups: year, qtr [2] year qtr area employment suppress total over_half <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> 1 2021 01 001 2 0 0.0116 TRUE 2 2021 01 005 4 0 0.0231 TRUE 3 2021 01 007 6 0 0.0347 TRUE 4 2021 01 009 8 1 0.0462 TRUE 5 2021 01 011 11 0 0.0636 TRUE 6 2021 01 013 10 0 0.0578 TRUE 7 2021 01 015 12 0 0.0694 TRUE 8 2021 01 017 14 0 0.0809 TRUE 9 2021 01 019 16 1 0.0925 TRUE 10 2021 01 021 18 0 0.104 TRUE # … with 16 more rows # ℹ Use `print(n =...)` to see more rows

我想你會想在這里使用group_by()mutate()

library(dplyr)

testitem |> 
  ## grouping by year and quarter
  ## sums will be calculated over areas
  group_by(year, qtr) |> 
  ## this could be more terse, but gets the job done.
  mutate(total_sum = sum(employment),
         ## This uses the total_sum column that was just created
         total_prop = employment/total_sum,
         ## leveraging the 0,1 coding of suppress
         suppress_sum = sum(suppress * employment),
         suppress_prop = suppress_sum/total,
         fifty = (1-suppress_prop) > 0.5)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM