[英]Counting 0s 1s and 2s in multiple column in r
My data looks like this:我的数据如下所示:
structure(list(did = c(209L, 209L, 206L, 206L, 206L, 206L, 206L,
206L, 206L, 206L, 206L, 209L, 206L, 206L, 207L, 207L, 207L, 207L,
209L, 209L), hhid = c(5668, 5595, 4724, 4756, 4856, 4730, 4757,
6320, 4758, 6319, 6311, 5477, 6322, 6317, 134, 178, 238, 179,
5865, 5875), bc = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L), rc = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
oap = c(2L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L,
2L, 2L, 2L, 2L, 2L, 0L, 0L)), row.names = c(NA, 20L), class = "data.frame")
hhid is unique for each row. hhid 对每一行都是唯一的。 For the remaining rows it consist of 0s and 1s in some columns and 0s 1s and 2s in other columns.对于剩余的行,它在某些列中由 0 和 1 组成,在其他列中由 0 和 1 和 2 组成。 The output column required is like this:所需的输出列是这样的:
did hh_count bc_0 bc_1 bc_2 rc_0 rc_1 rc_2 oap_0 oap_1 oap_2
where did will be unique.hh_count will be count of each hhid associated with did.其中 did 将是唯一的。hh_count 将是与 did 关联的每个 hhid 的计数。 bc_0, bc_1 and bc_1 will be breakup of column bc and it will represent count of 0s 1s and 2s in bc.Simmilarily for rc_0,rc_1and rc_2 and oap_0,oap_1 and oap_2.So counting of 0s 1s and 2s is required bc_0、bc_1 和 bc_1 将是 bc 列的分解,它将表示 bc 中 0s 1s 和 2s 的计数。对于 rc_0、rc_1 和 rc_2 以及 oap_0、oap_1 和 oap_2。因此需要对 0s 1s 和 2s 进行计数
With counts of 3 specific values, writing the functions manually seems reasonable.对于 3 个特定值的计数,手动编写函数似乎是合理的。 If you need specific counts of more distinct values we could come up with a better way to generalize - probably converting your data to long format, summarizing, and then going back to wide.如果您需要更多不同值的特定计数,我们可以提出一种更好的概括方法 - 可能将您的数据转换为长格式,汇总,然后返回宽格式。
library(dplyr) # across() requires dplyr version 1.0 or higher
dd %>% # (calling your data dd)
group_by(did) %>%
summarize(
hh_count = n_distinct(hhid),
across(c(bc, rc, oap),
.fns = list("0" = ~sum(. == 0), "1" = ~sum(. == 1), "2" = ~sum(. == 2)),
.names = "{.col}_{.fn}" # this is the default, but I show it explicitly
)
)
# # A tibble: 3 x 11
# did hh_count bc_0 bc_1 bc_2 rc_0 rc_1 rc_2 oap_0 oap_1 oap_2
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 206 11 2 9 0 2 9 0 6 0 5
# 2 207 4 0 4 0 0 4 0 0 0 4
# 3 209 5 2 3 0 0 5 0 3 0 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.