在R中根据值以及另一列的频率使用dplyr创建列

Question

I will edit the post name shortly as I think up a better title, but for the time being, a short example below highlights what I am struggling with: 当我认为标题更好时，我将尽快编辑帖子名称，但就目前而言，下面的一个简短示例突出了我正在努力解决的问题：

dput(mydf)
structure(list(gameID = c("34", "34", "34", "34", "34", "25", 
"25", "25")), class = "data.frame", row.names = c(NA, -8L))

mydf
    gameID
1     34
2     34
3     34
4     34
5     34
6     25
7     25
8     25

(garbocCol is included only so that the dataframe had >1 column - otherwise please ignore.) This feels like it should be a fairly straightforward data manipulation problem. （仅包含GarbocCol，以便数据框具有> 1列-否则请忽略。）这似乎应该是一个相当简单的数据操作问题。 I would like to create a new column that is simply the gameID column pasted with the count of that gameID. 我想创建一个新列，该列只是粘贴了该gameID计数的gameID列。 I am thus seeking the following output: 因此，我正在寻找以下输出：

mydf
  gameID    newCol
1     34     34-1
2     34     34-2
3     34     34-3
4     34     34-4
5     34     34-5
6     25     25-1
7     25     25-2
8     25     25-3

The gameID column is already a character, and the newCol is preferably going to be type character as well. gameID列已经是一个字符，并且newCol最好也将是type字符。 I am working within a long-ish dplyr chain, and am trying to get the following to work: 我正在一个长期的dplyr链中工作，并且正在尝试使以下各项起作用：

mydf <- mydf %>% 
  dplyr::mutate(newCol = paste0(gameID, '-', {what goes here}))

I am fairly easily able to do this with a for-loop, however a dplyr solution would be much better. 我使用for循环很容易做到这一点，但是使用dplyr解决方案会更好。

Answer 1

If we need to paste with sequence, get the sequence with row_number() grouped by 'gameID' and paste to create the 'newCol' 如果需要paste序列，请获取具有按“游戏ID”分组的row_number()的序列，并paste以创建“ newCol”

mydf %>%
    group_by(gameID) %>%
    mutate(newCol = paste(gameID, row_number(), sep = '-'))
# A tibble: 8 x 3
# Groups:   gameID [2]
#  gameID garboCol newCol
#  <fct>     <dbl> <chr> 
#1 34            1 34-1  
#2 34            2 34-2  
#3 34            3 34-3  
#4 34            4 34-4  
#5 34            5 34-5  
#6 25            6 25-1  
#7 25            7 25-2  
#8 25            8 25-3

If we want to make this shorter, an option is rowid from data.table . 如果我们想使其更短一些，则data.table rowid是一个选项。 Advantage is that it won't create the group attributes in the output 优点是它不会在输出中创建组属性

library(data.table)
mydf %>% 
  mutate(newCol = paste(gameID, rowid(gameID), sep='-'))
#   gameID garboCol newCol
#1     34        1   34-1
#2     34        2   34-2
#3     34        3   34-3
#4     34        4   34-4
#5     34        5   34-5
#6     25        6   25-1
#7     25        7   25-2
#8     25        8   25-3

Or use it with glue (from glue ) 或与glue一起使用（来自glue ）

library(glue)
mydf %>%
     mutate(newCol = glue("{gameID}-{rowid(gameID)}"))

Answer 2

This might be what you had in mind. 这可能就是您的想法。

mydf %>% 
 group_by(gameID) %>% 
 dplyr::mutate(newCol = paste0(gameID, '-', seq_along(gameID)))
# A tibble: 8 x 3
# Groups:   gameID [2]
#  gameID garboCol newCol
#  <fct>     <dbl> <chr> 
#1 34            1 34-1  
#2 34            2 34-2  
#3 34            3 34-3  
#4 34            4 34-4  
#5 34            5 34-5  
#6 25            6 25-1  
#7 25            7 25-2  
#8 25            8 25-3

在R中根据值以及另一列的频率使用dplyr创建列

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-12-07 22:24:47

解决方案2
2 2018-12-07 22:27:34

在R中根据值以及另一列的频率使用dplyr创建列

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-12-07 22:24:47

解决方案2 2 2018-12-07 22:27:34

解决方案1
2 已采纳 2018-12-07 22:24:47

解决方案2
2 2018-12-07 22:27:34