從2D python列表中提取唯一元素並將其放入新的2D列表中

Question

現在，我有一個包含三列和無數行的2D列表，每列包含一種唯一類型的東西。 第一列是UserID，第二列是時間戳，第三列是URL。 該列表如下所示：

[[304070, 2015:01:01, 'http:something1'],
[304070, 2015:01:02, 'http:something2'],
[304070, 2015:01:03, 'http:something2'],
[304070, 2015:01:03, 'http:something2'],
[304071, 2015:01:04, 'http:something2'],
[304071, 2015:01:05, 'http:something3'],
[304071, 2015:01:06, 'http:something3']]

如您所見，無論用戶ID和時間戳如何，都有一些重復的URL。

我需要提取包含唯一URL的那些行，並將它們放入新的2D列表中。

例如，第二行，第三行，第四行和第五行都具有相同的URL，而與userID和時間戳無關。 我只需要第二行（出現第一行）並將其放入新的2D列表中。 話雖這么說，第一行有一個唯一的URL，我也將其放入新列表中。 最后兩行（第六和第七行）具有相同的URL，我只需要第六行。

因此，我的新列表應如下所示：

[304070, 2015:01:01, 'http:something1'],
[304070, 2015:01:02, 'http:something2'],
[304071, 2015:01:05, 'http:something3']]

我考慮過使用這樣的東西：

for i in range(len(oldList):
    if oldList[i][2] not in newList:
        newList.append(oldList[i])

但是顯然這是行不通的，因為oldList[i][2]是一個元素， not in newList中檢查整個2D列表，即檢查每一行。 這樣的代碼將只創建oldList的精確副本。

或者，我可以消除那些URL重復的行，因為在具有一百萬行的2D列表上使用for循環加append運算符確實需要一段時間。

Answer 1

解決此問題的一個好方法是使用set 。 一次遍歷列表列表，將URL添加到集合中（如果尚未存在），然后將包含該URL的完整列表添加到新列表中。 如果URL已經存在，則丟棄當前列表，然后移至下一個列表。

old_list = [[304070, "2015:01:01", 'http:something1'],
            [304070, "2015:01:02", 'http:something2'],
            [304070, "2015:01:03", 'http:something2'],
            [304070, "2015:01:03", 'http:something2'],
            [304071, "2015:01:04", 'http:something2'],
            [304071, "2015:01:05", 'http:something3'],
            [304071, "2015:01:06", 'http:something3']]
new_list = []
url_set = set()

for item in old_list:
    if item[2] not in url_set:
        url_set.add(item[2])
        new_list.append(item)
    else:
        pass

>>> print(new_list)
[[304070, '2015:01:01', 'http:something1'], [304070, '2015:01:02', 'http:something2'], [304071, '2015:01:05', 'http:something3']]

Answer 2

>>> old_list = [[304070, "2015:01:01", 'http:something1'],
...            [304070, "2015:01:02", 'http:something2'],
...            [304070, "2015:01:03", 'http:something2'],
...            [304070, "2015:01:03", 'http:something2'],
...            [304071, "2015:01:04", 'http:something2'],
...            [304071, "2015:01:05", 'http:something3'],
...            [304071, "2015:01:06", 'http:something3']]
>>> temp_dict = {}
>>> for element in old_list:
...     if element[2] not in temp_dict:
...         temp_dict[element[2]] = [element[0], element[1], element[2]]
... 
>>> temp_dict.values()
[[304070, '2015:01:01', [304070, '2015:01:02', 'http:something2'], 'http:something1'], [304071, '2015:01:05', 'http:something3']]

注意：我假設列表中不同URL的順序無關緊要。 如果確實如此，請使用OrderedDict而不是默認dict 。

Answer 3

您需要創建一個函數，該函數在列表中搜索帶有url的項。

def hasUrl(list, url):
    for item in list:
        if item[1] == url:
            return True
    return False

然后，新的列表創建算法應如下所示。

for i in range(len(oldList)):
    if not hasUrl(newList, oldList[i][2]): # check if url is in list
        newList.append(oldList[i])

同樣，也無需創建范圍。 Python for循環按值迭代，因此您只需編寫

for item in oldList:
    if not hasUrl(newList, item[2]): # check if url is not in list
        newList.append(item)

Answer 4

my_list = [[304070, '2015:01:01', 'http:something1'],
           [304070, '2015:01:02', 'http:something2'],
           [304070, '2015:01:03', 'http:something2'],
           [304070, '2015:01:03', 'http:something2'],
           [304071, '2015:01:04', 'http:something2'],
           [304071, '2015:01:05', 'http:something3'],
           [304071, '2015:01:06', 'http:something3']]

從原始列表中拉出所有網址。 從此列表創建一個集合，以生成URL的唯一值。 使用列表推導來遍歷此集合，並使用生成的URL列表上的index （ urls ）定位該urls的首次出現。

最后，結合使用另一個列表理解和enumerate來選擇具有匹配索引值的行。

urls = [row[2] for row in my_list]
urls_unique = set(urls)
idx = [urls.index(url) for url in urls_unique]
my_shorter_list = [row for n, row in enumerate(my_list) if n in idx]

>>> my_shorter_list
[[304070, '2015:01:01', 'http:something1'],
 [304070, '2015:01:02', 'http:something2'],
 [304071, '2015:01:05', 'http:something3']]

從2D python列表中提取唯一元素並將其放入新的2D列表中

問題描述

4 個解決方案

解決方案1
1 已采納 2016-03-01 02:39:49

解決方案2
1 2016-03-01 02:55:05

解決方案3
0 2016-03-01 02:39:14

解決方案4
0 2016-03-01 02:48:55

從2D python列表中提取唯一元素並將其放入新的2D列表中

問題描述

4 個解決方案

解決方案1 1 已采納 2016-03-01 02:39:49

解決方案2 1 2016-03-01 02:55:05

解決方案3 0 2016-03-01 02:39:14

解決方案4 0 2016-03-01 02:48:55

解決方案1
1 已采納 2016-03-01 02:39:49

解決方案2
1 2016-03-01 02:55:05

解決方案3
0 2016-03-01 02:39:14

解決方案4
0 2016-03-01 02:48:55