简体   繁体   中英

Python cycle through dict

I have a problem with creating cycle using dict. I have a dictionary: the keys are unique numbers, and the values are words. I need to create a matrix: rows are numbers of the sentences, and columns are the unique numbers for words (from the dict). The element of the matrix will show the number of each word in each sentence. This is my code for creatind the dict. (At the beginning I had a raw text file with sentences)

with open ('sentences.txt', 'r') as file_obj:
    lines=[]
    for line in file_obj:
        line_split=re.split('[^a-z]',line.lower().strip()
        j=0
        new_line=[]
        while j<=len(line_split)-1:
            if (line_split[j]):
                new_line.append(line_split[j])
            j+=1            
        lines.append(new_line)    
    vocab = {}
    k = 1
    for i in range(len(lines)):
        for j in range(len(lines[i])):
            if lines[i][j] not in vocab.values():
                vocab[k]=lines[i][j]
                k+=1

import numpy as np  //now I am trying to create a matrix
matr = np.array(np.zeros((len(lines),len(vocab))))  
m=0
l=0
while l<22:
    for f in range (len(lines[l])):
        if vocab[1]==lines[l][f]:   //this works only for the 1 word in dict
            matr[l][0]+=1
    l+=1
print(matr[3][0])

matr = np.array(np.zeros((len(lines),len(vocab))))   // this also works
for values in range (len(vocab)):
    for line in lines:
        a=line.count(vocab[1])
        print(a)

But when I'm trying to make a cycle to go through the dict, nothing works! Could you please tell me how I can fill the whole matrix? Thank you very much in advance!

A few careless errors: line 7 needs a closing parenthesis, // is not Python syntax.

Looking at your code I have no idea what your general algorithm is, for creating just a basic word count dictionary. So I propose this much shorter code:

import re
import sys

def get_vocabulary (filename):
  vocab_dict = {}

  with open (filename, 'r') as file_obj:
    for line in file_obj:
      for word in re.findall(r'[a-z]+',line.lower()):
        if word in vocab_dict:   # see below for an interesting alternative
          vocab_dict[word] += 1
        else:
          vocab_dict[word] = 1
  return vocab_dict

if len(sys.argv) > 1:
  vocab = get_vocabulary (sys.argv[1])
  for word in vocab:
    print (word, '->', str(vocab[word]))

Note I replaced your own

line_split=re.split('[^a-z]',line.lower().strip())

with the reverse

re.findall(r'[a-z]+',line.lower())

because yours can return empty elements, and mine will not. Originally I had to add a test if word: before inserting it into the dictionary, to prevent adding lots of empties. With a better check for 'word', that is not necessary anymore.

(Fun with Python: The alternative for an if..else looks like this single line:

vocab_dict[word] = 1 if word not in vocab_dict else vocab_dict[word]+1

It is slightly less efficient because vocab_dict[word] has to be retrieved twice – you can't say .. + 1 on its own. Still, it's a nice line to read.)

Converting the dictionary to a 'matrix' (actually a simple array suffices) can be done, with a bit of help , using

matrix = [[vocab[word], word] for word in sorted(vocab)]
for row in matrix:
  print (row)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM