简体   繁体   English

dplyr中group_by层次结构内的计数级别

[英]Count level within group_by hierarchy in dplyr

I have a large data set in R that is organised with multiple records from individual cases, nested within groups. 我在R中有一个大数据集,该数据集由来自各个案例的多个记录组成,嵌套在组中。 A toy example is here: 一个玩具的例子在这里:

d = data.frame(group = rep(c('control','patient'), each = 5), case = c('a', 'a', 'b', 'c', 'c', 'd','d','d','e','e'))

If in a dplyr chain, group_by(group, case) is applied, how can a column be created that numbers each row with the order of its case within the group? 如果在dplyr链中应用了group_by(group, case) ,那么如何创建一列以行在组中的大小顺序对每一行编号? eg in the example below, in the third column, case 'a' is the first case in the control group, and case 'c' the third, but the numbering resets to 1 for case 'd', the first case in the patient group. 例如,在下面的示例中,在第三列中,病例“ a”是对照组中的第一个病例,病例“ c”是对照组中的第一个病例,但是对于病例“ d”(患者中的第一个病例),编号重置为1组。

  group case  number
control  a    1
control  a    1
control  b    2
control  c    3
control  c    3
patient  d    1
patient  d    1
patient  d    1
patient  e    2
patient  e    2

I can see how this would be done by counting cases using a 'for' loop, but am wondering if there is a way to achieve this within a standard dplyr-style chain of operations? 我可以看到通过使用“ for”循环对个案进行计数来了解如何做到这一点,但我想知道在标准的dplyr风格的操作链中是否有办法实现这一目标?

group_by(d, group) %>% 
   mutate(number= droplevels(case) %>% as.numeric)

We can use data.table 我们可以使用data.table

library(data.table)
setDT(d)[, numbers := as.numeric(factor(case, levels = unique(case))), group]

One solution would be: 一种解决方案是:

library(dplyr)
library(tibble)

want<-left_join(d,
                d %>%
                  distinct(case) %>%
                  rownames_to_column(var="number") ,
                by="case")

# .. added later:
want2<-left_join(d,
                 bind_rows(
                   d %>%
                     filter(group=="control") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number"),
                   d %>%
                     filter(group=="patient") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number")),
                   by="case")

# I think this is less readable:
want3<-left_join(d,
                 bind_rows(by(d,d$group,function(x) x %>%
                                distinct(case) %>%
                                rownames_to_column(var="number"))),
                 by="case")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM