減少Python列表列表中的重復項

Question

我正在編寫一個程序，該程序讀取多個文件，然后為其中的術語建立索引。 我能夠將文件讀入python中的2d數組（列表）中，但是隨后我需要刪除第一列中的重復項，並將索引存儲在新列中，該列具有重復單詞的首次出現。

例如：

['when', 1]
['yes', 1]
['', 1]
['greg', 1]
['17', 1]
['when',2]

第一列是術語，第二列是它來自的DocID，我希望能夠將其更改為：

['when', 1, 2]
['yes', 1]
['', 1]
['greg', 1]
['17', 1]

刪除重復項。

這是我到目前為止的內容：

for j in range(0,len(index)):
        for r in range(1,len(index)):
                if index[j][0] == index[r][0]:
                        index[j].append(index[r][1])
                        index.remove(index[r])

我在不斷收到超出范圍的錯誤

if index[j][0] == index[r][0]:

我認為這是因為我正在從索引中刪除一個對象，所以它正在變小。 任何想法都將不勝感激（是的，我知道我不應該修改原始內容，但這只是對其進行了小規模的測試）

Answer 1

構建dict / defaultdict會更合適嗎？

就像是：

from collections import defaultdict

ar = [['when', 1],
      ['yes', 1],
      ['', 1],
      ['greg', 1],
      ['17', 1],
      ['when',2]] 

result = defaultdict(list)
for lst in ar:
    result[lst[0]].append(lst[1])

輸出：

>>> for k,v in result.items():
...     print(repr(k),v)
'' [1]
'yes' [1]
'greg' [1]
'when' [1, 2]
'17' [1]

Answer 2

是的，您的錯誤來自於就地修改列表。 此外，您的解決方案對於長列表而言無效。 最好改用字典，最后將其轉換回列表：

from collections import defaultdict
od = defaultdict(list)

for term, doc_id in index:
    od[term].append(doc_id)

result = [[term] + doc_ids for term, doc_ids in od.iteritems()]

print result
# [['', 1], ['yes', 1], ['greg', 1], ['when', 1, 2], ['17', 1]]

Answer 3

實際上，您可以使用range()和len()完成此操作。 但是，python的優點是您可以直接迭代列表中的元素而無需索引

看一下這段代碼，然后嘗試理解。

#!/usr/bin/env python

def main():

    tot_array = \
    [ ['when', 1],
      ['yes', 1],
      ['', 1],
      ['greg', 1],
      ['17', 1],
      ['when',2]
    ]

    for aList1 in tot_array:
        for aList2 in tot_array:
            if aList1[0]==aList2[0] and aList1 !=aList2:
                aList1.append(aList2[1])
                tot_array.remove(aList2)
    print tot_array

    pass

if __name__ == '__main__':
    main()

輸出如下所示：

*** Remote Interpreter Reinitialized  ***
>>> 
[['when', 1, 2], ['yes', 1], ['', 1], ['greg', 1], ['17', 1]]

減少Python列表列表中的重復項

問題描述

3 個解決方案

解決方案1
3 2012-02-28 16:20:05

解決方案2
1 2012-02-28 16:26:10

解決方案3
0 2012-02-28 16:56:50

減少Python列表列表中的重復項

問題描述

3 個解決方案

解決方案1 3 2012-02-28 16:20:05

解決方案2 1 2012-02-28 16:26:10

解決方案3 0 2012-02-28 16:56:50

解決方案1
3 2012-02-28 16:20:05

解決方案2
1 2012-02-28 16:26:10

解決方案3
0 2012-02-28 16:56:50