![](/img/trans.png)
[英]Create a column with concatenated possible values of one column based on an ID column
[英]How to count the length of concatenated strings from one column based on an ID in another?
我有以下功能
set.seed(1984)
test <- function(paths){
x <- matrix(rep(NA, paths*3), ncol = 3,
dimnames = list(c(), c("Cookie", "Site", "Count")))
for(i in 1:paths){
x[i, 1] <- round(sqrt(rnorm(1,50,100)^2))
n <- function(){sample(1:10, size = 1)}
draws <- function(){sample(LETTERS[1:5], n(), replace = T)}
x[i, 2] <- paste(draws(), collapse = '-')
}
return(x)
}
產生像
Cookie Site Count
[1,] "91" "B-D-E-A" NA
[2,] "37" "E-A-D" NA
[3,] "108" "B" NA
[4,] "93" "D-A-D" NA
[5,] "157" "E-C" NA
[6,] "52" "B-C-D-A-C-C-B-A-B-E" NA
對於Cookie
列中的每個唯一Cookie ID,我想
Site
字符串連接在一起( Cookie
包含重復值) Cookie
ID的Count
值(因此,可能會重復) 有任何想法嗎?
這會將您的矩陣按Cookie
分組,並返回“ Site
列中的字符總數(該數字等於提示的長度)。
test.df <- test(91)
library(dplyr)
test.df %>%
as.data.frame(., stringsAsFactors = FALSE) %>%
group_by(Cookie) %>%
mutate(Count = sum(nchar(Site)))
如果要讓Count
排除字符-
,則將Site
替換為gsub("-", "", Site, fixed = TRUE)
。
有了data.table
,我們可以做
library(data.table)
dt <- as.data.table(test(91))[, Count := as.character(sum(nchar(gsub("-", "", Site)))) ,
by = Cookie][]
dt[, Full_path := gsub("-", ", ", toString(Site)), by = Cookie]
head(dt)
# Cookie Site Count Full_path
#1: 258 A 1 A
#2: 26 D-D-E-E-C 10 D, D, E, E, C, E, E, A, C, A
#3: 43 D-D-A 3 D, D, A
#4: 171 C-C-E-A-B-D-E 7 C, C, E, A, B, D, E
#5: 57 A-D-D-C 4 A, D, D, C
#6: 156 A-D 2 A, D
如果需要全破折號
dt[, Full_path := paste(Site, collapse="-"), by = Cookie]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.