遍歷字典中的多個值？

Question

我有一個單詞列表和字典：

word_list = ["it's","they're","there's","he's"]

以及一本字典，其中包含有關words_list的單詞在多個文檔中出現的頻率的信息：

dict = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}),
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}),
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]

我想開發一個數據結構（也許是數據框架？），如下所示：

file       word       count
document1  it's        0
document1  they're     2
document1  there's     5
document1  he's        1
document2  it's        4
document2  they're     2
document2  there's     3
document2  he's        0
document3  it's        7
document3  they're     0
document3  there's     4
document3  he's        1

我試圖找到這些文檔中最常用的words 。 我有900多個文件。

我在想以下內容：

res = {}
for i in words_list:
    count = 0
    for j in dict.items():
         if i == j:
              count = count + 1
              res[i,j] = count

我可以從這里去哪里？

Answer 1

首先，您的字典不是字典，應該像這樣構建

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1},
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0},
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}

現在我們已經有了一個字典，我們可以使用pandas來構建一個數據框，但是為了以您想要的方式獲得它，我們必須在字典中構建一個列表列表。 然后，我們將創建一個數據框並標記列，然后進行排序

import collections
import pandas as pd

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1},
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0},
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}

d = pd.DataFrame([[k,k1,v1] for k,v in d.items() for k1,v1 in v.items()], columns = ['File','Words','Count'])
print d.sort(['File','Count'], ascending=[1,1])

         File    Words  Count
1   document1     it's      0
0   document1     he's      1
3   document1  they're      2
2   document1  there's      5
4   document2     he's      0
7   document2  they're      2
6   document2  there's      3
5   document2     it's      4
11  document3  they're      0
8   document3     he's      1
10  document3  there's      4
9   document3     it's      7

如果希望出現前n個，則可以在排序時使用groupby() ，然后使用head() or tail()

d = d.sort(['File','Count'], ascending=[1,1]).groupby('File').head(2)

         File    Words  Count
1   document1     it's      0
0   document1     he's      1
4   document2     he's      0
7   document2  they're      2
11  document3  they're      0
8   document3     he's      1

list comprehension返回看起來像這樣的列表列表

d = [['document1', "he's", 1], ['document1', "it's", 0], ['document1', "there's", 5], ['document1', "they're", 2], ['document2', "he's", 0], ['document2', "it's", 4], ['document2', "there's", 3], ['document2', "they're", 2], ['document3', "he's", 1], ['document3', "it's", 7], ['document3', "there's", 4], ['document3', "they're", 0]]

為了正確地構建字典，您只需使用以下內容：

d['document1']['it\'s'] = 1

如果由於某種原因您不願意使用str和dict的元組列表，則可以改用此列表理解

[[i[0],k1,v1] for i in d for k1,v1 in i[1].items()]

Answer 2

這樣的事情怎么樣？

word_list = ["it's","they're","there's","he's"]

frequencies = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}),
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}),
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]

result = []
for document in frequencies:
    for word in word_list:
        result.append({"file":document[0], "word":word,"count":document[1][word]})

print result

遍歷字典中的多個值？

問題描述

2 個解決方案

解決方案1
2 已采納 2015-11-04 21:19:45

解決方案2
1 2015-11-04 20:53:12

遍歷字典中的多個值？

問題描述

2 個解決方案

解決方案1 2 已采納 2015-11-04 21:19:45

解決方案2 1 2015-11-04 20:53:12

解決方案1
2 已采納 2015-11-04 21:19:45

解決方案2
1 2015-11-04 20:53:12