简体   繁体   English

Python遍历字典

[英]Python cycle through dict

I have a problem with creating cycle using dict. 我使用dict创建周期时遇到问题。 I have a dictionary: the keys are unique numbers, and the values are words. 我有一本字典:键是唯一的数字,值是单词。 I need to create a matrix: rows are numbers of the sentences, and columns are the unique numbers for words (from the dict). 我需要创建一个矩阵:行是句子的编号,列是单词的唯一编号(来自dict)。 The element of the matrix will show the number of each word in each sentence. 矩阵的元素将显示每个句子中每个单词的数量。 This is my code for creatind the dict. 这是我创建dict的代码。 (At the beginning I had a raw text file with sentences) (一开始我有一个带有句子的原始文本文件)

with open ('sentences.txt', 'r') as file_obj:
    lines=[]
    for line in file_obj:
        line_split=re.split('[^a-z]',line.lower().strip()
        j=0
        new_line=[]
        while j<=len(line_split)-1:
            if (line_split[j]):
                new_line.append(line_split[j])
            j+=1            
        lines.append(new_line)    
    vocab = {}
    k = 1
    for i in range(len(lines)):
        for j in range(len(lines[i])):
            if lines[i][j] not in vocab.values():
                vocab[k]=lines[i][j]
                k+=1

import numpy as np  //now I am trying to create a matrix
matr = np.array(np.zeros((len(lines),len(vocab))))  
m=0
l=0
while l<22:
    for f in range (len(lines[l])):
        if vocab[1]==lines[l][f]:   //this works only for the 1 word in dict
            matr[l][0]+=1
    l+=1
print(matr[3][0])

matr = np.array(np.zeros((len(lines),len(vocab))))   // this also works
for values in range (len(vocab)):
    for line in lines:
        a=line.count(vocab[1])
        print(a)

But when I'm trying to make a cycle to go through the dict, nothing works! 但是,当我尝试循环执行该命令时,没有任何效果! Could you please tell me how I can fill the whole matrix? 您能告诉我如何填写整个矩阵吗? Thank you very much in advance! 提前非常感谢您!

A few careless errors: line 7 needs a closing parenthesis, // is not Python syntax. 一些粗心的错误:第7行需要右括号, //不是Python语法。

Looking at your code I have no idea what your general algorithm is, for creating just a basic word count dictionary. 查看您的代码,我不知道您的一般算法是什么,只创建一个基本的字数字典。 So I propose this much shorter code: 因此,我提出了以下简短的代码:

import re
import sys

def get_vocabulary (filename):
  vocab_dict = {}

  with open (filename, 'r') as file_obj:
    for line in file_obj:
      for word in re.findall(r'[a-z]+',line.lower()):
        if word in vocab_dict:   # see below for an interesting alternative
          vocab_dict[word] += 1
        else:
          vocab_dict[word] = 1
  return vocab_dict

if len(sys.argv) > 1:
  vocab = get_vocabulary (sys.argv[1])
  for word in vocab:
    print (word, '->', str(vocab[word]))

Note I replaced your own 请注意,我替换了您自己的

line_split=re.split('[^a-z]',line.lower().strip())

with the reverse 相反

re.findall(r'[a-z]+',line.lower())

because yours can return empty elements, and mine will not. 因为您可以返回空元素,而我不会。 Originally I had to add a test if word: before inserting it into the dictionary, to prevent adding lots of empties. 最初,我不得不添加一个if word:的测试if word:在将其插入字典之前,以防止添加大量的容器。 With a better check for 'word', that is not necessary anymore. 通过更好地检查“单词”,就不再需要了。

(Fun with Python: The alternative for an if..else looks like this single line: (使用Python的乐趣: if..else的替代方法看起来像这样:

vocab_dict[word] = 1 if word not in vocab_dict else vocab_dict[word]+1

It is slightly less efficient because vocab_dict[word] has to be retrieved twice – you can't say .. + 1 on its own. 它的效率略低,因为vocab_dict[word]必须检索两次-您不能单独说.. + 1 Still, it's a nice line to read.) 不过,这还是很不错的一行。)

Converting the dictionary to a 'matrix' (actually a simple array suffices) can be done, with a bit of help , using 使用一些帮助 ,可以将字典转换为“矩阵”(实际上是一个简单的数组就足够了)

matrix = [[vocab[word], word] for word in sorted(vocab)]
for row in matrix:
  print (row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM