简体   繁体   English

如何在Python中遍历defaultdict(list)?

[英]How to iterate through a defaultdict(list) in Python?

How do i iterate through a defaultdict(list) in Python? 如何在Python中遍历defaultdict(list)? Is there a better way of having a dictionary of lists in Python? 有没有更好的方法来在Python中使用列表字典? I've tried the normal iter(dict) but I've got the error: 我已经尝试了正常的iter(dict)但出现了错误:

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

The main class: 主班:

import para
para.print_doc('./foo/bar/para-lines.txt')

The para.pyc: para.pyc:

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

An eg of ./foo/bar/para-lines.txt looks like this: ./foo/bar/para-lines.txt如下所示:

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

The output of the main class should look like this: 主类的输出应如下所示:

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

The problem you have with line 您遇到的问题

for para in iter(doc):

is that doc is an instance of Paragraph, not a defaultdict . doc是Paragraph的实例,而不是defaultdict The default dict you use in the __init__ method goes out of scope and is lost. 您在__init__方法中使用的默认dict超出范围并丢失。 So you need to do two things: 因此,您需要做两件事:

  1. Save the doc created in the __init__ method as an instance variable ( self.doc , for example). 将在__init__方法中创建的doc另存为实例变量(例如self.doc )。

  2. Either make Paragraphs itself iterable (by adding an __iter__ method), or allow it to access the created doc object. 使Paragraphs本身可迭代(通过添加__iter__方法),或允许其访问创建的doc对象。

The recipe you linked to is rather old. 您链接到的食谱很旧。 It was written in 2001 before Python had more modern tools like itertools.groupby (introduced in Python2.4, released in late 2003 ). 它是在2001年编写的,之前Python具有更多现代工具,例如itertools.groupby在2003年末发布的 Python2.4中引入)。 Here is what your code could look like using groupby : 这是使用groupby代码的外观:

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

The problem seems to be that you're iterating over your Paragraphs class, not the dictionary. 问题似乎是您要遍历Paragraphs类,而不是字典。 Also, instead of iterating over keys and then accessing the dictionary entry, consider using 另外,不要遍历键再访问字典条目,而应考虑使用

for (key, value) in d.items():

It's failing because you don't have __iter__() defined in your Paragraphs class and then try to call iter(doc) (where doc is a Paragraphs instance). 之所以失败,是因为您没有在Paragraphs类中定义__iter__() ,然后尝试调用iter(doc) (其中doc是Paragraphs实例)。

To be iterable a class has to have __iter__() which returns iterator. 要进行迭代,类必须具有__iter__() ,该函数返回迭代器。 Docs here . 文档在这里

I can't think of any reason why you're using a dict here, let alone a defaultdict. 我想不出任何原因在这里使用dict,更不用说defaultdict了。 A list of list would be much simpler. 列表列表会简单得多。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM