[英]how best to calculate this share of a total
Below is the sample data.以下是示例数据。 The goal is to first create a column that contains the total employment for that quarter.
目标是首先创建一个列,其中包含该季度的总就业人数。 Second is to create a new column that shows the relative share for the area.
其次是创建一个新列,显示该区域的相对份额。 Finally, the last item (and one which is vexing me) is to calculate whether the total with suppress = 0 represents over 50% of the total.
最后,最后一项(也是令我烦恼的一项)是计算 suppress = 0 的总数是否超过总数的 50%。 I can do this in excel easily but trying to this in R and so have it be something that I can replicate year after year.
我可以在 excel 中轻松做到这一点,但在 R 中尝试做到这一点,所以我可以年复一年地复制它。
desired result is below期望的结果如下
area <- c("001","005","007","009","011","013","015","017","019","021","023","027","033","001","005","007","009","011","013","015","017","019","021","023","027","033")
year <- c("2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021")
qtr <- c("01","01","01","01","01","01","01","01","01","01","01","01","01","02","02","02","02","02","02","02","02","02","02","02","02","02")
employment <- c(2,4,6,8,11,10,12,14,16,18,20,22,30,3,5,8,9,12,9,24,44,33,298,21,26,45)
suppress <- c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0)
testitem <- data.frame(year,qtr, area, employment, suppress)
For the first quarter of 2021, the total is 173. If you only take suppress = 1 into account, that is only 24 of 173 hence the TRUE in the 50 percent column.对于 2021 年第一季度,总数为 173。如果只考虑 suppress = 1,则只有 173 中的 24,因此在 50% 列中为 TRUE。 If these two values summed up to 173/2 or greater than you would have it say FALSE.
如果这两个值的总和为 173/2 或大于您会说 FALSE。 For the second quarter, the suppress = 1 accounts for 310 of the total of 537 and so is over 50% of the total.
第二季度,suppress = 1 占总数 537 个中的 310 个,占总数的 50% 以上。
For the total column, I am showing the computation or ingredients.对于总计列,我显示了计算或成分。 Ideally, it would show a value such as.0115 in place of 2/173.
理想情况下,它会显示诸如 .0115 之类的值,而不是 2/173。
year qtr area employment suppress total 50percent
2021 01 001 2 0 =2/173 TRUE
2021 01 005 4 0 =4/173 TRUE
.....
2021 02 001 3 0 =3/537 FALSE
2021 02 005 5 0 =5/537 FALSE
For example:例如:
library(dplyr)
testitem %>%
group_by(year, qtr) %>%
mutate(
total = employment / sum(employment),
over_half = sum(employment[suppress == 0]) > (0.5 * sum(employment))
)
Gives:给出:
# A tibble: 26 × 7 # Groups: year, qtr [2] year qtr area employment suppress total over_half <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> 1 2021 01 001 2 0 0.0116 TRUE 2 2021 01 005 4 0 0.0231 TRUE 3 2021 01 007 6 0 0.0347 TRUE 4 2021 01 009 8 1 0.0462 TRUE 5 2021 01 011 11 0 0.0636 TRUE 6 2021 01 013 10 0 0.0578 TRUE 7 2021 01 015 12 0 0.0694 TRUE 8 2021 01 017 14 0 0.0809 TRUE 9 2021 01 019 16 1 0.0925 TRUE 10 2021 01 021 18 0 0.104 TRUE # … with 16 more rows # ℹ Use `print(n =...)` to see more rows
I think you'll want to use group_by()
and mutate()
here.我想你会想在这里使用
group_by()
和mutate()
。
library(dplyr)
testitem |>
## grouping by year and quarter
## sums will be calculated over areas
group_by(year, qtr) |>
## this could be more terse, but gets the job done.
mutate(total_sum = sum(employment),
## This uses the total_sum column that was just created
total_prop = employment/total_sum,
## leveraging the 0,1 coding of suppress
suppress_sum = sum(suppress * employment),
suppress_prop = suppress_sum/total,
fifty = (1-suppress_prop) > 0.5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.