繁体   English   中英

如何根据另一列中的ID计算一列中连接字符串的长度?

[英]How to count the length of concatenated strings from one column based on an ID in another?

我有以下功能

set.seed(1984)
test <- function(paths){
  x <- matrix(rep(NA, paths*3), ncol = 3, 
              dimnames = list(c(), c("Cookie", "Site", "Count")))
  for(i in 1:paths){
    x[i, 1] <- round(sqrt(rnorm(1,50,100)^2))
    n <- function(){sample(1:10, size = 1)}
    draws <- function(){sample(LETTERS[1:5], n(), replace = T)}
    x[i, 2] <- paste(draws(), collapse = '-')
    }
  return(x)
}

产生像

Cookie      Site                  Count
[1,] "91"   "B-D-E-A"             NA   
[2,] "37"   "E-A-D"               NA   
[3,] "108"  "B"                   NA   
[4,] "93"   "D-A-D"               NA   
[5,] "157"  "E-C"                 NA   
[6,] "52"   "B-C-D-A-C-C-B-A-B-E" NA

对于Cookie列中的每个唯一Cookie ID,我想

  1. 将每个Site字符串连接在一起( Cookie包含重复值)
  2. 得到串联的长度
  3. 删除该TOTAL长度作为该Cookie ID的Count值(因此,可能会重复)

有任何想法吗?

这会将您的矩阵按Cookie分组,并返回“ Site列中的字符总数(该数字等于提示的长度)。

test.df <- test(91)
library(dplyr)
test.df %>% 
  as.data.frame(., stringsAsFactors = FALSE) %>% 
  group_by(Cookie) %>% 
  mutate(Count = sum(nchar(Site)))

如果要让Count排除字符- ,则将Site替换为gsub("-", "", Site, fixed = TRUE)

有了data.table ,我们可以做

library(data.table)
dt <- as.data.table(test(91))[, Count := as.character(sum(nchar(gsub("-", "", Site)))) , 
                    by = Cookie][]

dt[, Full_path := gsub("-", ", ", toString(Site)), by = Cookie]
head(dt)
#   Cookie          Site Count                    Full_path
#1:    258             A     1                            A
#2:     26     D-D-E-E-C    10 D, D, E, E, C, E, E, A, C, A
#3:     43         D-D-A     3                      D, D, A
#4:    171 C-C-E-A-B-D-E     7          C, C, E, A, B, D, E
#5:     57       A-D-D-C     4                   A, D, D, C
#6:    156           A-D     2                         A, D

如果需要全破折号

dt[, Full_path := paste(Site, collapse="-"), by = Cookie]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM