Python：列表列表的字典

Question

def makecounter():
     return collections.defaultdict(int)

class RankedIndex(object):
  def __init__(self):
    self._inverted_index = collections.defaultdict(list)
    self._documents = []
    self._inverted_index = collections.defaultdict(makecounter)


def index_dir(self, base_path):
    num_files_indexed = 0
    allfiles = os.listdir(base_path)
    self._documents = os.listdir(base_path)
    num_files_indexed = len(allfiles)
    docnumber = 0
    self._inverted_index = collections.defaultdict(list)

    docnumlist = []
    for file in allfiles: 
            self.documents = [base_path+file] #list of all text files
            f = open(base_path+file, 'r')
            lines = f.read()

            tokens = self.tokenize(lines)
            docnumber = docnumber + 1
            for term in tokens:  
                if term not in sorted(self._inverted_index.keys()):
                    self._inverted_index[term] = [docnumber]
                    self._inverted_index[term][docnumber] +=1                                           
                else:
                    if docnumber not in self._inverted_index.get(term):
                        docnumlist = self._inverted_index.get(term)
                        docnumlist = docnumlist.append(docnumber)
            f.close()
    print '\n \n'
    print 'Dictionary contents: \n'
    for term in sorted(self._inverted_index):
        print term, '->', self._inverted_index.get(term)
    return num_files_indexed
    return 0

執行此代碼時出現索引錯誤：列表索引超出范圍。

上面的代碼生成一個字典索引，它將'term'存儲為一個鍵，以及將該術語作為列表出現的文檔編號。 例如：如果術語'cat'出現在1.txt，5.txt和7.txt文件中，那么字典就會有：cat < - [1,5,7]

現在，我必須修改它以添加術語頻率，因此如果單詞cat在文檔1中出現兩次，則在文檔5中出現三次，在文檔7中出現一次：預期結果：term < - [[docnumber，term freq]，[docnumber， term freq]] < - 字典中的列表清單!!! 貓< - [[1,2]，[5,3]，[7,1]]

我玩了代碼，但沒有任何作用。 我不知道修改這個數據結構來實現上述目標。

提前致謝。

Answer 1

首先，使用工廠。 從...開始：

def makecounter():
    return collections.defaultdict(int)

以后用

self._inverted_index = collections.defaultdict(makecounter)

並作為for term in tokens:循環，

        for term in tokens:  
                self._inverted_index[term][docnumber] +=1

這留下每個self._inverted_index[term]一個dict，如

{1:2,5:3,7:1}

在你的例子中。 因為你需要在每個self._inverted_index[term]中列出一個列表，然后在循環結束后添加：

self._inverted_index = dict((t,[d,v[d] for d in sorted(v)])
                            for t in self._inverted_index)

一旦制作出來（這種方式或任何其他方式 - 我只是展示了一種簡單的方法來構建它！），這個數據結構實際上就像你不必要地使用它一樣難以構建，當然（dict of dict） dict更有用，易於使用和構建），但是，嘿，一個人的肉＆c ;-)。

Answer 2

這是您可以使用的一般算法，但您將調整一些代碼。 它生成一個包含每個文件的字數字典的字典。

filedicts = {}
for file in allfiles:
  filedicts[file] = {}

  for term in terms:
    filedict.setdefault(term, 0)
    filedict[term] += 1

Answer 3

也許你可以為（docname，frequency）創建一個簡單的類。

然后你的dict可以有這種新數據類型的列表。 您也可以列出一個列表，但是單獨的數據類型會更清晰。

Python：列表列表的字典

問題描述

3 個解決方案

解決方案1
6 已采納 2010-10-05 03:14:44

解決方案2
1 2010-10-05 03:09:32

解決方案3
0 2010-10-05 03:06:17

Python：列表列表的字典

問題描述

3 個解決方案

解決方案1 6 已采納 2010-10-05 03:14:44

解決方案2 1 2010-10-05 03:09:32

解決方案3 0 2010-10-05 03:06:17

解決方案1
6 已采納 2010-10-05 03:14:44

解決方案2
1 2010-10-05 03:09:32

解決方案3
0 2010-10-05 03:06:17