简体   繁体   中英

lxml.etree.iterparse closes input file handler?

filterous is using iterparse to parse a simple XML StringIO object in a unit test . However, when trying to access the StringIO object afterwards, Python exits with a " ValueError: I/O operation on closed file " message. According to the iterparse documentation , "Starting with lxml 2.3, the .close() method will also be called in the error case," but I get no error message or Exception from iterparse . My IO-foo is obviously not up to speed, so does anyone have suggestions?

The command and (hopefully) relevant code:

$ python2.6 setup.py test

setup.py:

from setuptools import setup
from filterous import filterous as package

setup(
    ...
    test_suite = 'tests.tests',

tests/tests.py:

from cStringIO import StringIO
import unittest

from filterous import filterous

XML = '''<posts tag="" total="3" ...'''

class TestSearch(unittest.TestCase):
    def setUp(self):
        self.xml = StringIO(XML)
        self.result = StringIO()
    ...
    def test_empty_tag_not(self):
        """Empty tag; should get N results."""
        filterous.search(
            self.xml,
            self.result,
            {'ntag': [u'']},
            ['href'],
            False)
        self.assertEqual(
            len(self.result.getvalue().splitlines()),
            self.xml.getvalue().count('<post '))

filterous/filterous.py:

from lxml import etree
...
def search(file_pointer, out, terms, includes, human_readable = True):
    ...
    context = etree.iterparse(file_pointer, tag='posts')

Traceback:

ERROR: Empty tag; should get N results.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/victor/dev/filterous/tests/tests.py", line 149, in test_empty_tag_not
    self.xml.getvalue().count('<post '))
ValueError: I/O operation on closed file

PS: The tests all ran fine on 2010-07-27 .

Seems to work fine with StringIO , try using that instead of cStringIO . No idea why it's getting closed.

Docs-fu is the problem. What you quoted "Starting with lxml 2.3, the .close() method will also be called in the error case," is nothing to do with iterparse. It appears on your linked page before the section on iterparse. It is part of the docs for the target parser interface. It is referring to the close() method of the target (output!) object, nothing to do with your StringIO. In any case, you also seem to have ignored that little word also . Before 2.3, lxml closed the target object only if the parse was successful. Now it also closes it upon error.

Why do you want to "access" the StringIO object after parsing has finished?

Update By trying to access the database afterwards, do you mean all those self.xml.getvalue() calls in your tests? [Show the ferschlugginer traceback in your question so we don't need to guess!] If that's causing the problem (it does count as an IO operation), forget getvalue() ... if it were to work, wouldn't it return the (unconventionally named) (invariant) XML?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM