简体   繁体   English

如何从R中的给定字符串中识别字符串中重复次数最多的字符

[英]How to identify the most repeated character in a string from a given string in R

I have this problem that I don't have the smallest idea on how to approach.我有这个问题,我对如何处理没有最小的想法。 Imagine that you have the following string "aabccccdeddaaa".假设您有以下字符串“aabccccdeddaaa”。 The program needs to return the most repeated consecutive character, and how many times it's repeated, one might think that it's "a" because it repeats 5 times in the string, but that's not what I'm looking for.程序需要返回重复次数最多的连续字符,重复多少次,可能会认为是“a”,因为它在字符串中重复了5次,但这不是我要找的。 The correct answer for my problem is "c" because even though it just repeats 4 times, it does repeat those 4 times consecutively, while "a" repeats only 3 times consecutively.我的问题的正确答案是“c”,因为即使它只重复了 4 次,它也确实连续重复了 4 次,而“a”只连续重复了 3 次。

Not looking for the solution though, only for some guidance on how to start.虽然不是在寻找解决方案,只是为了一些关于如何开始的指导。

Just went ahead and did it.只是继续前进并做到了。 You need to use a combination of a few functions.您需要结合使用几个功能。 The main one is rle .主要的是rle It counts consecutive values.它计算连续值。 The rest is just putting together some basic functions to extract the elements of rle you need.剩下的就是把一些基本的函数组合起来,提取出你需要的rle元素。

# Which letter is repeated most
rle(unlist(strsplit("aabccccdeddaaa", "")))$values[which.max(rle(unlist(strsplit("aabccccdeddaaa", "")))$lengths)]
[1] "c"

# How many times it's repeated
max(rle(unlist(strsplit("aabccccdeddaaa", "")))$lengths)
[1] 4

One approach would be to split the input string at any point where the previous and following letters do not agree.一种方法是在前后字母不一致的任何点拆分输入字符串。 Then, sort the resulting vector of parts descending to find the letter/term which appeared the most:然后,对部分的结果向量进行降序排序以找到出现最多的字母/术语:

x <- "aabccccdeddaaa"
parts <- strsplit(x, "(?<=(.))(?!\\1)", perl=TRUE)[[1]]
parts[order(-nchar(parts), parts)][1]

[1] "cccc"

For reference, here is the vector of terms:作为参考,这里是术语的向量:

parts
[1] "aa"   "b"    "cccc" "d"    "e"    "dd"   "aaa"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM