[英]Create column with dplyr based on value and also frequency of another column, in R
I will edit the post name shortly as I think up a better title, but for the time being, a short example below highlights what I am struggling with: 当我认为标题更好时,我将尽快编辑帖子名称,但就目前而言,下面的一个简短示例突出了我正在努力解决的问题:
dput(mydf)
structure(list(gameID = c("34", "34", "34", "34", "34", "25",
"25", "25")), class = "data.frame", row.names = c(NA, -8L))
mydf
gameID
1 34
2 34
3 34
4 34
5 34
6 25
7 25
8 25
(garbocCol is included only so that the dataframe had >1 column - otherwise please ignore.) This feels like it should be a fairly straightforward data manipulation problem. (仅包含GarbocCol,以便数据框具有> 1列-否则请忽略。)这似乎应该是一个相当简单的数据操作问题。 I would like to create a new column that is simply the gameID column pasted with the count of that gameID.
我想创建一个新列,该列只是粘贴了该gameID计数的gameID列。 I am thus seeking the following output:
因此,我正在寻找以下输出:
mydf
gameID newCol
1 34 34-1
2 34 34-2
3 34 34-3
4 34 34-4
5 34 34-5
6 25 25-1
7 25 25-2
8 25 25-3
The gameID column is already a character, and the newCol is preferably going to be type character as well. gameID列已经是一个字符,并且newCol最好也将是type字符。 I am working within a long-ish dplyr chain, and am trying to get the following to work:
我正在一个长期的dplyr链中工作,并且正在尝试使以下各项起作用:
mydf <- mydf %>%
dplyr::mutate(newCol = paste0(gameID, '-', {what goes here}))
I am fairly easily able to do this with a for-loop, however a dplyr solution would be much better. 我使用for循环很容易做到这一点,但是使用dplyr解决方案会更好。
If we need to paste
with sequence, get the sequence with row_number()
grouped by 'gameID' and paste
to create the 'newCol' 如果需要
paste
序列,请获取具有按“游戏ID”分组的row_number()
的序列,并paste
以创建“ newCol”
mydf %>%
group_by(gameID) %>%
mutate(newCol = paste(gameID, row_number(), sep = '-'))
# A tibble: 8 x 3
# Groups: gameID [2]
# gameID garboCol newCol
# <fct> <dbl> <chr>
#1 34 1 34-1
#2 34 2 34-2
#3 34 3 34-3
#4 34 4 34-4
#5 34 5 34-5
#6 25 6 25-1
#7 25 7 25-2
#8 25 8 25-3
If we want to make this shorter, an option is rowid
from data.table
. 如果我们想使其更短一些,则
data.table
rowid
是一个选项。 Advantage is that it won't create the group attributes in the output 优点是它不会在输出中创建组属性
library(data.table)
mydf %>%
mutate(newCol = paste(gameID, rowid(gameID), sep='-'))
# gameID garboCol newCol
#1 34 1 34-1
#2 34 2 34-2
#3 34 3 34-3
#4 34 4 34-4
#5 34 5 34-5
#6 25 6 25-1
#7 25 7 25-2
#8 25 8 25-3
Or use it with glue
(from glue
) 或与
glue
一起使用(来自glue
)
library(glue)
mydf %>%
mutate(newCol = glue("{gameID}-{rowid(gameID)}"))
This might be what you had in mind. 这可能就是您的想法。
mydf %>%
group_by(gameID) %>%
dplyr::mutate(newCol = paste0(gameID, '-', seq_along(gameID)))
# A tibble: 8 x 3
# Groups: gameID [2]
# gameID garboCol newCol
# <fct> <dbl> <chr>
#1 34 1 34-1
#2 34 2 34-2
#3 34 3 34-3
#4 34 4 34-4
#5 34 5 34-5
#6 25 6 25-1
#7 25 7 25-2
#8 25 8 25-3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.