Python 在列表中找到相似的元素組合

Question

所以我有一個看起來有點像這樣的列表：

my_list = [0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,0,1,1,0,1 ... 0,1,0]

它基本上包含數千個 0 和 1。 我正在尋找一種方法來找到其中元素的相似（重復）組合（具體是 10 個下一個元素）。 所以（例如）如果有：

... 0,1,1,1,0,0,1,1,0,1 ...

組合並且它不止一次出現我想知道它在我的列表（索引）中的位置以及它重復了多少次。

我需要在這里檢查所有可能的組合，即 1024 種可能性...

Answer 1

這是使用正則表達式的解決方案：

import random
from itertools import product
import re

testlist = [str(random.randint(0,1)) for i in range(1000)]

testlist_str = "".join(testlist)

for i in ["".join(seq) for seq in product("01", repeat=10)]:
    print(f'pattern {i} has {len(re.findall(i, testlist_str))} matches')

輸出：

pattern 0000000000 has 0 matches
pattern 0000000001 has 0 matches
pattern 0000000010 has 1 matches
pattern 0000000011 has 2 matches
pattern 0000000100 has 2 matches
pattern 0000000101 has 2 matches
....

Answer 2

它看起來像一個家庭作業問題，所以我不想立即給出解決方案，只是提示。

不要從字面上看。 它是 0 和 1，因此您可以像查看二進制數一樣查看它們。

一些提示：

1024 個“模式”變成了從 0 到 1023 的數字。
檢查模式是從這 10 位數字中生成一個數字。

想想那時你會怎么做。

更多提示，更多技術：

如果你有一個數字模式，例如從第 0 到第 9 個元素，你可以通過取 9 位（從第一個索引到第 9 個索引）值（又名%512 ）來獲得第 1 到第 10 個模式，將它們向左“移動”（ *2 ）並添加第 10 位數字。
制作一個字典或列表列表，其中鍵/索引是模式編號（0 到 1023），列表包含起始模式的索引。

稍后我將編輯此答案以提供示例解決方案，但我必須先休息一下。

編輯：

可定制的底座和長度，為您的案例提供默認值。

def find_patterns(my_list, base=2, pattern_size=10):
    modulo_value = base ** (pattern_size-1)
    results = [[] for _ in range(base ** pattern_size)]
    current_value = 0
    for index, elem in enumerate(a):
        if index < pattern_size:
            current_value = base*current_value + elem
        elif index == pattern_size:
            results[current_value].append(0)
        if index >= pattern_size:
            current_value = base*(current_value % modulo_value) + elem
            results[current_value].append(index+1-pattern_size)  #index of the first element in the pattern
    return results

Answer 3

IIUC，你可以這樣做：

my_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]

w = 10
occurrences = {}
for i in range(len(my_list) - w + 1):
    key = tuple(my_list[i:i+w])
    occurrences.setdefault(key, []).append(i)

for pattern, indices in occurrences.items():
    print(pattern, indices)

Output

(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) [0]
(1, 1, 1, 0, 0, 1, 0, 1, 0, 1) [1]
(1, 1, 0, 0, 1, 0, 1, 0, 1, 1) [2]
(1, 0, 0, 1, 0, 1, 0, 1, 1, 0) [3]
(0, 0, 1, 0, 1, 0, 1, 1, 0, 0) [4]
(0, 1, 0, 1, 0, 1, 1, 0, 0, 0) [5]
(1, 0, 1, 0, 1, 1, 0, 0, 0, 1) [6]
(0, 1, 0, 1, 1, 0, 0, 0, 1, 0) [7]
(1, 0, 1, 1, 0, 0, 0, 1, 0, 1) [8]
(0, 1, 1, 0, 0, 0, 1, 0, 1, 1) [9]
(1, 1, 0, 0, 0, 1, 0, 1, 1, 0) [10]
(1, 0, 0, 0, 1, 0, 1, 1, 0, 1) [11]
(0, 0, 0, 1, 0, 1, 1, 0, 1, 0) [12]
(0, 0, 1, 0, 1, 1, 0, 1, 0, 1) [13]
(0, 1, 0, 1, 1, 0, 1, 0, 1, 0) [14]

Answer 4

將元素視為可以轉換為整數的位。 下面的解決方案將輸入列表轉換為整數，查找每個 integer 的出現次數以及可以在什么索引上找到它們。

import collections

x = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1]
as_int = []

# given the input above there is no pattern longer than 6 that occure more than once...
pattern_length = 6

# convert input to a list of integers
# can this be done in a nicer way, like skipping the string-conversion?
for s in range(len(x) - pattern_length+1) :
    bitstring = ''.join([str(b) for b in x[s:s+pattern_length]])
    as_int.append(int(bitstring,2))

# create a dict with integer as key and occurence as value
count_dict = collections.Counter(as_int)

# empty dict to store index for each integer
index_dict = {}

# find index for each integer that occur more than once
for key in dict(count_dict):
    if count_dict[key] > 1:
        indexes = [i for i, x in enumerate(as_int) if x == key]
        index_dict[key] = indexes

#print as binary together with its index
for key, value in index_dict.items():
    print('{0:06b}'.format(key), 'appears', count_dict[key], 'times, on index:', value)

Output：

101011 appears 2 times, on index: [6, 18]
010110 appears 2 times, on index: [7, 14]

Python 在列表中找到相似的元素組合

問題描述

4 個解決方案

解決方案1
2 已采納 2019-11-18 14:50:47

解決方案2
2 2019-11-18 14:51:24

解決方案3
1 2019-11-18 14:51:37

解決方案4
1 2019-11-18 15:31:49

Python 在列表中找到相似的元素組合

問題描述

4 個解決方案

解決方案1 2 已采納 2019-11-18 14:50:47

解決方案2 2 2019-11-18 14:51:24

解決方案3 1 2019-11-18 14:51:37

解決方案4 1 2019-11-18 15:31:49

解決方案1
2 已采納 2019-11-18 14:50:47

解決方案2
2 2019-11-18 14:51:24

解決方案3
1 2019-11-18 14:51:37

解決方案4
1 2019-11-18 15:31:49