简体   繁体   English

在r中的多列中计算0s 1s和2s

[英]Counting 0s 1s and 2s in multiple column in r

My data looks like this:我的数据如下所示:

structure(list(did = c(209L, 209L, 206L, 206L, 206L, 206L, 206L, 
206L, 206L, 206L, 206L, 209L, 206L, 206L, 207L, 207L, 207L, 207L, 
209L, 209L), hhid = c(5668, 5595, 4724, 4756, 4856, 4730, 4757, 
6320, 4758, 6319, 6311, 5477, 6322, 6317, 134, 178, 238, 179, 
5865, 5875), bc = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L), rc = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    oap = c(2L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 
    2L, 2L, 2L, 2L, 2L, 0L, 0L)), row.names = c(NA, 20L), class = "data.frame")

hhid is unique for each row. hhid 对每一行都是唯一的。 For the remaining rows it consist of 0s and 1s in some columns and 0s 1s and 2s in other columns.对于剩余的行,它在某些列中由 0 和 1 组成,在其他列中由 0 和 1 和 2 组成。 The output column required is like this:所需的输出列是这样的:

did   hh_count   bc_0   bc_1  bc_2   rc_0  rc_1  rc_2  oap_0  oap_1  oap_2

where did will be unique.hh_count will be count of each hhid associated with did.其中 did 将是唯一的。hh_count 将是与 did 关联的每个 hhid 的计数。 bc_0, bc_1 and bc_1 will be breakup of column bc and it will represent count of 0s 1s and 2s in bc.Simmilarily for rc_0,rc_1and rc_2 and oap_0,oap_1 and oap_2.So counting of 0s 1s and 2s is required bc_0、bc_1 和 bc_1 将是 bc 列的分解,它将表示 bc 中 0s 1s 和 2s 的计数。对于 rc_0、rc_1 和 rc_2 以及 oap_0、oap_1 和 oap_2。因此需要对 0s 1s 和 2s 进行计数

With counts of 3 specific values, writing the functions manually seems reasonable.对于 3 个特定值的计数,手动编写函数似乎是合理的。 If you need specific counts of more distinct values we could come up with a better way to generalize - probably converting your data to long format, summarizing, and then going back to wide.如果您需要更多不同值的特定计数,我们可以提出一种更好的概括方法 - 可能将您的数据转换为长格式,汇总,然后返回宽格式。

library(dplyr)  # across() requires dplyr version 1.0 or higher
dd %>%          # (calling your data dd)
  group_by(did) %>%
  summarize(
    hh_count = n_distinct(hhid),
    across(c(bc, rc, oap),
           .fns = list("0" = ~sum(. == 0), "1" = ~sum(. == 1), "2" = ~sum(. == 2)),
           .names = "{.col}_{.fn}"  # this is the default, but I show it explicitly
           )
  )
# # A tibble: 3 x 11
#     did hh_count  bc_0  bc_1  bc_2  rc_0  rc_1  rc_2 oap_0 oap_1 oap_2
#   <int>    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1   206       11     2     9     0     2     9     0     6     0     5
# 2   207        4     0     4     0     0     4     0     0     0     4
# 3   209        5     2     3     0     0     5     0     3     0     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM