Python遍历字典

Question

I have a problem with creating cycle using dict. 我使用dict创建周期时遇到问题。 I have a dictionary: the keys are unique numbers, and the values are words. 我有一本字典：键是唯一的数字，值是单词。 I need to create a matrix: rows are numbers of the sentences, and columns are the unique numbers for words (from the dict). 我需要创建一个矩阵：行是句子的编号，列是单词的唯一编号（来自dict）。 The element of the matrix will show the number of each word in each sentence. 矩阵的元素将显示每个句子中每个单词的数量。 This is my code for creatind the dict. 这是我创建dict的代码。 (At the beginning I had a raw text file with sentences) （一开始我有一个带有句子的原始文本文件）

with open ('sentences.txt', 'r') as file_obj:
    lines=[]
    for line in file_obj:
        line_split=re.split('[^a-z]',line.lower().strip()
        j=0
        new_line=[]
        while j<=len(line_split)-1:
            if (line_split[j]):
                new_line.append(line_split[j])
            j+=1            
        lines.append(new_line)    
    vocab = {}
    k = 1
    for i in range(len(lines)):
        for j in range(len(lines[i])):
            if lines[i][j] not in vocab.values():
                vocab[k]=lines[i][j]
                k+=1

import numpy as np  //now I am trying to create a matrix
matr = np.array(np.zeros((len(lines),len(vocab))))  
m=0
l=0
while l<22:
    for f in range (len(lines[l])):
        if vocab[1]==lines[l][f]:   //this works only for the 1 word in dict
            matr[l][0]+=1
    l+=1
print(matr[3][0])

matr = np.array(np.zeros((len(lines),len(vocab))))   // this also works
for values in range (len(vocab)):
    for line in lines:
        a=line.count(vocab[1])
        print(a)

But when I'm trying to make a cycle to go through the dict, nothing works! 但是，当我尝试循环执行该命令时，没有任何效果！ Could you please tell me how I can fill the whole matrix? 您能告诉我如何填写整个矩阵吗？ Thank you very much in advance! 提前非常感谢您！

Answer 1

A few careless errors: line 7 needs a closing parenthesis, // is not Python syntax. 一些粗心的错误：第7行需要右括号， //不是Python语法。

Looking at your code I have no idea what your general algorithm is, for creating just a basic word count dictionary. 查看您的代码，我不知道您的一般算法是什么，只创建一个基本的字数字典。 So I propose this much shorter code: 因此，我提出了以下简短的代码：

import re
import sys

def get_vocabulary (filename):
  vocab_dict = {}

  with open (filename, 'r') as file_obj:
    for line in file_obj:
      for word in re.findall(r'[a-z]+',line.lower()):
        if word in vocab_dict:   # see below for an interesting alternative
          vocab_dict[word] += 1
        else:
          vocab_dict[word] = 1
  return vocab_dict

if len(sys.argv) > 1:
  vocab = get_vocabulary (sys.argv[1])
  for word in vocab:
    print (word, '->', str(vocab[word]))

Note I replaced your own 请注意，我替换了您自己的

line_split=re.split('[^a-z]',line.lower().strip())

with the reverse 相反

re.findall(r'[a-z]+',line.lower())

because yours can return empty elements, and mine will not. 因为您可以返回空元素，而我不会。 Originally I had to add a test if word: before inserting it into the dictionary, to prevent adding lots of empties. 最初，我不得不添加一个if word:的测试if word:在将其插入字典之前，以防止添加大量的容器。 With a better check for 'word', that is not necessary anymore. 通过更好地检查“单词”，就不再需要了。

(Fun with Python: The alternative for an if..else looks like this single line: （使用Python的乐趣： if..else的替代方法看起来像这样：

vocab_dict[word] = 1 if word not in vocab_dict else vocab_dict[word]+1

It is slightly less efficient because vocab_dict[word] has to be retrieved twice – you can't say .. + 1 on its own. 它的效率略低，因为vocab_dict[word]必须检索两次-您不能单独说.. + 1 。 Still, it's a nice line to read.) 不过，这还是很不错的一行。）

Converting the dictionary to a 'matrix' (actually a simple array suffices) can be done, with a bit of help , using 使用一些帮助，可以将字典转换为“矩阵”（实际上是一个简单的数组就足够了）

matrix = [[vocab[word], word] for word in sorted(vocab)]
for row in matrix:
  print (row)

Python遍历字典

问题描述

1 个解决方案

解决方案1
0 2016-04-16 23:05:26

Python遍历字典

问题描述

1 个解决方案

解决方案1 0 2016-04-16 23:05:26

解决方案1
0 2016-04-16 23:05:26