![](/img/trans.png)
[英]iterating through multiple values [for the same key] of an ordered dictionary python
[英]Iterating through multiple values in a dictionary?
我有一个单词列表和字典:
word_list = ["it's","they're","there's","he's"]
以及一本字典,其中包含有关words_list
的单词在多个文档中出现的频率的信息:
dict = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}),
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}),
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]
我想开发一个数据结构(也许是数据框架?),如下所示:
file word count
document1 it's 0
document1 they're 2
document1 there's 5
document1 he's 1
document2 it's 4
document2 they're 2
document2 there's 3
document2 he's 0
document3 it's 7
document3 they're 0
document3 there's 4
document3 he's 1
我试图找到这些文档中最常用的words
。 我有900多个文件。
我在想以下内容:
res = {}
for i in words_list:
count = 0
for j in dict.items():
if i == j:
count = count + 1
res[i,j] = count
我可以从这里去哪里?
首先,您的字典不是字典,应该像这样构建
d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1},
'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0},
'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}
现在我们已经有了一个字典,我们可以使用pandas来构建一个数据框,但是为了以您想要的方式获得它,我们必须在字典中构建一个列表列表。 然后,我们将创建一个数据框并标记列,然后进行排序
import collections
import pandas as pd
d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1},
'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0},
'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}
d = pd.DataFrame([[k,k1,v1] for k,v in d.items() for k1,v1 in v.items()], columns = ['File','Words','Count'])
print d.sort(['File','Count'], ascending=[1,1])
File Words Count
1 document1 it's 0
0 document1 he's 1
3 document1 they're 2
2 document1 there's 5
4 document2 he's 0
7 document2 they're 2
6 document2 there's 3
5 document2 it's 4
11 document3 they're 0
8 document3 he's 1
10 document3 there's 4
9 document3 it's 7
如果希望出现前n个,则可以在排序时使用groupby()
,然后使用head() or tail()
d = d.sort(['File','Count'], ascending=[1,1]).groupby('File').head(2)
File Words Count
1 document1 it's 0
0 document1 he's 1
4 document2 he's 0
7 document2 they're 2
11 document3 they're 0
8 document3 he's 1
list comprehension返回看起来像这样的列表列表
d = [['document1', "he's", 1], ['document1', "it's", 0], ['document1', "there's", 5], ['document1', "they're", 2], ['document2', "he's", 0], ['document2', "it's", 4], ['document2', "there's", 3], ['document2', "they're", 2], ['document3', "he's", 1], ['document3', "it's", 7], ['document3', "there's", 4], ['document3', "they're", 0]]
为了正确地构建字典,您只需使用以下内容:
d['document1']['it\'s'] = 1
如果由于某种原因您不愿意使用str和dict的元组列表,则可以改用此列表理解
[[i[0],k1,v1] for i in d for k1,v1 in i[1].items()]
这样的事情怎么样?
word_list = ["it's","they're","there's","he's"]
frequencies = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}),
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}),
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]
result = []
for document in frequencies:
for word in word_list:
result.append({"file":document[0], "word":word,"count":document[1][word]})
print result
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.