在元組列表中查找重復項

Question

系統會為您提供有關網站用戶的信息。 該信息包括用戶名，電話號碼和/或電子郵件。 編寫一個程序，該程序接受一個元組列表，其中每個元組代表特定用戶的信息，並返回一個列表列表，其中每個子列表包含元組索引，其中元組索引包含有關同一個人的信息。 例如：

Input:
[("MLGuy42", "andrew@example.com", "123-4567"),
("CS229DungeonMaster", "123-4567", "ml@example.net"),
("Doomguy", "john@example.org", "carmack@example.com"),
("andrew26", "andrew@example.com", "mlguy@example.com")]

Output:
[[0, 1, 3], [2]]

由於“ MLGuy42”，“ CS229DungeonMaster”和“ andrew26”都是同一個人。

輸出中的每個子列表均應排序，外部列表應按子列表中的第一個元素排序。

下面是我針對此問題所做的代碼片段。 似乎工作正常，但我想知道是否有更好/優化的解決方案。 任何幫助，將不勝感激。 謝謝！

def find_duplicates(user_info):
    results = list()
    seen = dict()
    for i, user in enumerate(user_info):
        first_seen = True
        key_info = None
        for info in user:
            if info in seen:
                first_seen = False
                key_info = info
                break
        if first_seen:
            results.append([i])
            pos = len(results) - 1
        else:
            index = seen[key_info]
            results[index].append(i)
            pos = index
        for info in user:
            seen[info] = pos
    return results

Answer 1

我認為我已經達到了使用圖形的優化工作解決方案。 基本上，我創建了一個圖，每個節點都包含其用戶信息和索引。 然后，使用dfs遍歷圖形並找到重復項。

Answer 2

我認為我們可以使用集合簡化此操作：

from random import shuffle

def find_duplicates(user_info):

    reduced = unreduced = {frozenset(info): [i] for i, info in enumerate(user_info)}

    while reduced is unreduced or len(unreduced) > len(reduced):

        unreduced = dict(reduced)  # make a copy

        for identifiers_1, positions_1 in unreduced.items():

            for identifiers_2, positions_2 in unreduced.items():

                if identifiers_1 is identifiers_2:
                    continue

                if identifiers_1 & identifiers_2:
                    del reduced[identifiers_1], reduced[identifiers_2]
                    reduced[identifiers_1 | identifiers_2] = positions_1 + positions_2
                    break
            else:  # no break
                continue

            break

    return sorted(sorted(value) for value in reduced.values())

my_input = [ \
    ("CS229DungeonMaster", "123-4567", "ml@example.net"), \
    ("Doomguy", "john@example.org", "carmack@example.com"), \
    ("andrew26", "andrew@example.com", "mlguy@example.com"), \
    ("MLGuy42", "andrew@example.com", "123-4567"), \
]

shuffle(my_input)  # shuffle to prove order independence

print(my_input)
print(find_duplicates(my_input))

OUTPUT

> python3 test.py
[('CS229DungeonMaster', '123-4567', 'ml@example.net'), ('MLGuy42', 'andrew@example.com', '123-4567'), ('andrew26', 'andrew@example.com', 'mlguy@example.com'), ('Doomguy', 'john@example.org', 'carmack@example.com')]
[[0, 1, 2], [3]]
>

在元組列表中查找重復項

問題描述

2 個解決方案

解決方案1
1 已采納 2017-11-20 12:09:37

解決方案2
0 2017-11-20 06:03:09

在元組列表中查找重復項

問題描述

2 個解決方案

解決方案1 1 已采納 2017-11-20 12:09:37

解決方案2 0 2017-11-20 06:03:09

解決方案1
1 已采納 2017-11-20 12:09:37

解決方案2
0 2017-11-20 06:03:09