从嵌套字典中计算出某些值

Question

假设我有一本嵌套的字典：

      myDict = { 'a': { 1: 2,
                        2: 163,
                        3: 12,
                        4: 67,
                        5: 84
                        },
             'about': { 1: 27,
                        2: 45,
                        3: 21,
                        4: 10,
                        5: 15
                        },
                'an': { 1:  3,
                        2: 15,
                        3:  1,
                        4:312,
                        5:100
                        }
        'anticipate': { 1:  1,
                        2:  5,
                        3:  0,
                        4:  8,
                        5:  7
                        }
             'apple': { 1:  0,
                        2:  5,
                        3:  0,
                        4:  10,
                        5:  0
                        }
           }

外键是一个单词，内键是该单词包含的文件，值是该单词在该文件中出现的次数。

我想解决两件事：

第一个是每个单词出现的总次数，因此“ a”为328。

第二个是包含每个单词的文件数，因此“ a”为5，而“ apple”为2。

我猜这些“值”将是两个字典，但是标准字典，而不是嵌套的字典，即{word：总计数}和{word：它出现在其中的文件数}。

编辑：我想解决的另一件事是每个文件的字向量大小。

因此，对于文件1，它将是sqrt（2 ^ 2 + 27 ^ 2 + 3 ^ 2 + 1 ^ 2 + 0 ^ 2）

Answer 1

IIUC，您可以通过字典理解直接完成此操作。

给定单词词典中所有值的总和：

>>> {k: sum(d.values()) for k,d in myDict.items()}
{'a': 328, 'about': 118, 'apple': 15, 'anticipate': 21, 'an': 431}

子词典中大于零的值的数量：

>>> {k: sum(v > 0 for v in d.values()) for k,d in myDict.items()}
{'a': 5, 'about': 5, 'apple': 2, 'anticipate': 4, 'an': 5}

最后一个依赖于int(True) == 1和int(False) == 0的事实，因此， 1 if v > 0 else 0或其他值， 1 if v > 0 else 0我们不需要写1 if v > 0 else 0 ，而是可以对布尔值求和。

Answer 2

好的，您的问题不是很困难。

问题1：“首先是每个单词出现的总次数，因此对于'a'，它是328。”

word = "a"
total = sum(myDict[word].values())
print total

或者，如果要为myDict中的每个键计算它：

for word in myDict:
    total = sum(myDict[word].values())
    print word, total

问题2：“第二个是包含每个单词的文件数，因此'a'为5，而'apple'为2。

for word in myDict:
    number_of_files = sum(bool(v) for v in myDict[word].values())
    print word, number_of_files

Answer 3

对于术语文档矩阵，嵌套字典是非常非常糟糕的数据结构。 而是按照计算嵌套词典中单词/文档向量之间的距离中的建议使用numpy数组

即使您不喜欢使用numpy数组，也不需要字典结构字典，因为您的内键是顺序的 。 您可以使用以下结构来简化存储数据的方式：

myDict = {'a':[2, 163, 12, 67, 84], 
          'about':[27, 45, 21, 10, 15], 
          'apple':[0, 5, 0, 10, 0], 
          'anticipate': [1, 5, 0, 8, 7], 
          'an':[3, 15, 1, 312, 100]}

当您访问时，您使用相同的方法，但从内键的第0个索引开始：

print myDict['a'][0] # a in 1st document
print myDict['a'][1] # a in 2nd document
print myDict['apple'][2] # apple in 3rd document

要简单地计算每个单词的总和：

sum(myDict['a'])
sum([1 for word in myDict if myDict[word] > 0])

这是完整的代码：

myDict = {'a': {1:2, 2:163, 3:12, 4:67, 5:84}, 
          'about': {1:27, 2:45, 3:21, 4:10, 5:15}, 
          'apple': {1:0, 2: 5, 3:0, 4:10, 5:0}, 
          'anticipate': {1:1, 2:5, 3:0, 4:8, 5:7}, 
          'an': {1:3, 2:15, 3:1, 4:312, 5:100}}

myDict = {'a':[2, 163, 12, 67, 84], 
          'about':[27, 45, 21, 10, 15], 
          'apple':[0, 5, 0, 10, 0], 
          'anticipate': [1, 5, 0, 8, 7], 
          'an':[3, 15, 1, 312, 100]}

print myDict['a'][0] # a in 1st document
print myDict['a'][1] # a in 2nd document
print myDict['apple'][2] # apple in 3rd document 

print sum(myDict['a'])
# How many documents does apple occur in?
print sum([1 for doc in myDict['apple'] if doc > 0])

再次强调， 使用内部键为连续整数的字典字典是没有意义的 ，您只需剥离内部键即可。

从嵌套字典中计算出某些值

问题描述

3 个解决方案

解决方案1
3 已采纳 2014-11-19 20:01:16

解决方案2
0 2014-11-19 20:20:14

解决方案3
0 2014-11-20 01:52:02

从嵌套字典中计算出某些值

问题描述

3 个解决方案

解决方案1 3 已采纳 2014-11-19 20:01:16

解决方案2 0 2014-11-19 20:20:14

解决方案3 0 2014-11-20 01:52:02

解决方案1
3 已采纳 2014-11-19 20:01:16

解决方案2
0 2014-11-19 20:20:14

解决方案3
0 2014-11-20 01:52:02