簡體   English   中英

如何根據另一列中的ID計算一列中連接字符串的長度?

[英]How to count the length of concatenated strings from one column based on an ID in another?

我有以下功能

set.seed(1984)
test <- function(paths){
  x <- matrix(rep(NA, paths*3), ncol = 3, 
              dimnames = list(c(), c("Cookie", "Site", "Count")))
  for(i in 1:paths){
    x[i, 1] <- round(sqrt(rnorm(1,50,100)^2))
    n <- function(){sample(1:10, size = 1)}
    draws <- function(){sample(LETTERS[1:5], n(), replace = T)}
    x[i, 2] <- paste(draws(), collapse = '-')
    }
  return(x)
}

產生像

Cookie      Site                  Count
[1,] "91"   "B-D-E-A"             NA   
[2,] "37"   "E-A-D"               NA   
[3,] "108"  "B"                   NA   
[4,] "93"   "D-A-D"               NA   
[5,] "157"  "E-C"                 NA   
[6,] "52"   "B-C-D-A-C-C-B-A-B-E" NA

對於Cookie列中的每個唯一Cookie ID,我想

  1. 將每個Site字符串連接在一起( Cookie包含重復值)
  2. 得到串聯的長度
  3. 刪除該TOTAL長度作為該Cookie ID的Count值(因此,可能會重復)

有任何想法嗎?

這會將您的矩陣按Cookie分組,並返回“ Site列中的字符總數(該數字等於提示的長度)。

test.df <- test(91)
library(dplyr)
test.df %>% 
  as.data.frame(., stringsAsFactors = FALSE) %>% 
  group_by(Cookie) %>% 
  mutate(Count = sum(nchar(Site)))

如果要讓Count排除字符- ,則將Site替換為gsub("-", "", Site, fixed = TRUE)

有了data.table ,我們可以做

library(data.table)
dt <- as.data.table(test(91))[, Count := as.character(sum(nchar(gsub("-", "", Site)))) , 
                    by = Cookie][]

dt[, Full_path := gsub("-", ", ", toString(Site)), by = Cookie]
head(dt)
#   Cookie          Site Count                    Full_path
#1:    258             A     1                            A
#2:     26     D-D-E-E-C    10 D, D, E, E, C, E, E, A, C, A
#3:     43         D-D-A     3                      D, D, A
#4:    171 C-C-E-A-B-D-E     7          C, C, E, A, B, D, E
#5:     57       A-D-D-C     4                   A, D, D, C
#6:    156           A-D     2                         A, D

如果需要全破折號

dt[, Full_path := paste(Site, collapse="-"), by = Cookie]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM