删除重复的字母/字母

Question

Input : kdff455556tfkkkw 输入：kdff455556tfkkkw

Output1 : kdf[2]45[4]6tfk[3]w 输出1：kdf [2] 45 [4] 6tfk [3] w

Hint: Repeated "alphnum" by the alphnum[No_Of_Repetition] 提示：字母重复“字母” [No_Of_Repetition]

Answer 1

s <- c('kdff455556tfkkkw','abc','abbccc');
sapply(strsplit(s,''),function(x) paste(with(rle(x),ifelse(lengths==1,values,paste0(values,'[',lengths,']'))),collapse=''));
## [1] "kdf[2]45[4]6tfk[3]w" "abc"                 "ab[2]c[3]"

Answer 2

Here's a roundabout regex approach: 这是一种环回正则表达式方法：

require(stringr)
p <- "(([a-z0-9])\\2+)"
x <- str_split(s,p)
r <- sapply(str_extract_all(s,p),
       function(x)if (length(x))paste0(substr(x,1,1),"[",nchar(x),"]")else "")

mapply(function(x,r)paste0(
  c(x,r)[order(c(seq_along(x),seq_along(r)))]
,collapse=""),x,r)
# output for @bgoldst's input:
# [1] "kdf[2]45[4]6tfk[3]w" "abc"                 "ab[2]c[3]"

The last step is an interleaving trick copied from @Arun . 最后一步是从@Arun复制的交错技巧。

I wish my initial guess had worked: 我希望我最初的猜测能奏效：

sapply(s,function(x)gsub(p,paste0("\\2[",nchar("\\1"),"]"),x))

But it seems that "\\\\1" is is not evaluated inside gsub , and somehow always has two characters: 但似乎在gsub内未对"\\\\1"进行求值，并且总有两个字符：

#      kdff455556tfkkkw                   abc                abbccc 
# "kdf[2]45[2]6tfk[2]w"                 "abc"           "ab[2]c[2]"

删除重复的字母/字母

问题描述

2 个解决方案

解决方案1
3 2015-05-29 19:33:22

解决方案2
1 2015-05-29 20:10:59

删除重复的字母/字母

问题描述

2 个解决方案

解决方案1 3 2015-05-29 19:33:22

解决方案2 1 2015-05-29 20:10:59

解决方案1
3 2015-05-29 19:33:22

解决方案2
1 2015-05-29 20:10:59