在数据框中生成新列，按组计算重复项

Question

I want to generate a new variable in a dataset. 我想在数据集中生成一个新变量。 This variable should count the occurence of values in different groups, defined by another variable. 此变量应计算由另一个变量定义的不同组中的值的出现。

Here an example dataframe: 这是一个示例数据帧：

 x <- c(1, 1, 2, 3, 3, 3, 4, 4)
 y <- c(5, 4, 4, 5, 5, 5, 1, 1)

 dat <- data.frame(x, y)
 dat

   x y
 1 1 5
 2 1 4
 3 2 4
 4 3 5
 5 3 5
 6 3 5
 7 4 1
 8 4 1

Now i want to generate a new variable, let's call it z. 现在我想生成一个新变量，我们称之为z。 z should count the occurence of duplicates in y by groups (groups defined by x: 1, 2, 3, 4). z应该按组（由x定义的组：1,2,3,4）计算y中重复项的出现次数。 Therefore, the result should look like this: 因此，结果应如下所示：

Is there a way to do that with dplyr? 有没有办法用dplyr做到这一点？

Answer 1

An option is to do a group by and create a sequence column 一个选项是执行分组并创建序列列

library(dplyr)
dat %>% 
     group_by(x, y) %>%
     mutate(z = row_number())
# A tibble: 8 x 3
# Groups:   x, y [5]
#      x     y     z
#  <dbl> <dbl> <int>
#1     1     5     1
#2     1     4     1
#3     2     4     1
#4     3     5     1
#5     3     5     2
#6     3     5     3
#7     4     1     1
#8     4     1     2

Also with base R 还有base R

dat$z <- with(dat, ave(seq_along(x), x, y, FUN = seq_along))

Or with data.table 或者使用data.table

library(data.table)
setDT(dat)[, z := seq_len(.N), .(x, y)]

Or more compactly 或者更紧凑

setDT(dat)[, z := rowid(x, y)]

Answer 2

One possibility could be: 一种可能性是：

dat %>%
 group_by(x) %>%
 mutate(z = cumsum(duplicated(y)) + 1)

      x     y     z
  <dbl> <dbl> <dbl>
1     1     5     1
2     1     4     1
3     2     4     1
4     3     5     1
5     3     5     2
6     3     5     3
7     4     1     1
8     4     1     2

The same with base R : 与base R相同：

with(dat, ave(y, x, FUN = function(x) cumsum(duplicated(x)) + 1))

在数据框中生成新列，按组计算重复项

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-07-18 14:41:37

解决方案2
2 2019-07-18 14:41:31

在数据框中生成新列，按组计算重复项

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-07-18 14:41:37

解决方案2 2 2019-07-18 14:41:31

解决方案1
3 已采纳 2019-07-18 14:41:37

解决方案2
2 2019-07-18 14:41:31