简体   繁体   English

在R中找到重复的图案

[英]Find a pattern with duplicates in R

In R, I have a list of lists with data such as 在R中,我有一个包含数据的列表列表,例如

1 "a" "b" "c"
2 "a" "a" "b" "d"
3 "a" "a"

I need to identify the commonly occurring patterns in the lists. 我需要在列表中确定常见的模式。 For example, here, "a""a", and "a""b" are common. 例如,在这里,“ a”“ a”和“ a”“ b”是常见的。 I tried using eclat(), but that doesn't allow for repeated values in each list. 我尝试使用eclat(),但是不允许每个列表中都有重复的值。 I then tried removing the duplicate values, but then I lose information (like "a""a" is a frequent pattern). 然后,我尝试删除重复的值,但随后丢失了信息(例如“ a”,“ a”是一种常见模式)。

I also tried renaming the duplicate occurrences, but then "a""b" and "a""a""b" won't return "a""b" as a pattern, since the second list would be renamed to something like "a""a2""b". 我也尝试重命名重复出现的内容,但是“ a”“ b”和“ a”“ a”“ b”不会将“ a”“ b”作为模式返回,因为第二个列表将重命名为类似“ a”“ a2”“ b”。

Is there any better way to do this? 有什么更好的方法吗?

Update: 更新:

The strings in each list can be single characters or a string of characters. 每个列表中的字符串可以是单个字符或字符串。 For example 例如

1 "a+12" "bfd" "c"
2 "a+12" "a+12" "bfd" "d"
3 "a+12" "a+12" "a"

Here, "a+12" "bfd" and "a+12" "a+12 should be recognized as patterns 这里,“ a + 12”,“ bfd”和“ a + 12”,“ a + 12”应被识别为模式

At least with the sample data, something like the following looks helpful: 至少对于示例数据而言,类似以下内容看起来很有帮助:

#data in a 'list'
myls = list(x1 = c("a+12", "bfd", "c"), 
            x2 = c("a+12", "a+12", "bfd", "d"),
            x3 = c("a+12", "a+12", "a"))

pats = table(unlist(lapply(myls, 
           function(x) combn(seq_along(x), 2, 
                             function(i) paste(x[i[1]:i[2]], collapse = ";")))))
strsplit(names(pats[pats == max(pats)]), ";")
#[[1]]
#[1] "a+12" "a+12"
#
#[[2]]
#[1] "a+12" "bfd"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM