I'm basically trying to do this (pseudo code, not valid python):
limit = 10
results = [xml_to_dict(artist) for artist in xml.findall('artist') while limit--]
So how could I code this in a concise and efficient way? The XML file can contain anything between 0 and 50 artists, and I can't control how many to get at a time, and AFAIK, there's no XPATH expression to say something like "get me up to 10 nodes".
Thanks!
Are you using lxml
? You could use XPath to limit the items in the query level, eg
>>> from lxml import etree
>>> from io import StringIO
>>> xml = etree.parse(StringIO('<foo><bar>1</bar><bar>2</bar><bar>4</bar><bar>8</bar></foo>'))
>>> [bar.text for bar in xml.xpath('bar[position()<=3]')]
['1', '2', '4']
You could also use itertools.islice
to limit any iterable , eg
>>> from itertools import islice
>>> [bar.text for bar in islice(xml.iterfind('bar'), 3)]
['1', '2', '4']
>>> [bar.text for bar in islice(xml.iterfind('bar'), 5)]
['1', '2', '4', '8']
Assuming that xml
is an ElementTree
object, the findall()
method returns a list, so just slice that list:
limit = 10
limited_artists = xml.findall('artist')[:limit]
results = [xml_to_dict(artist) for artist in limited_artists]
For everyone else who found this question because they were trying to limit items returned from an infinite generator:
from itertools import takewhile
ltd = takewhile(lambda x: x[0] < MY_LIMIT, enumerate( MY_INFINITE_GENERATOR ))
# ^ This is still an iterator.
# If you want to materialize the items, e.g. in a list, do:
ltd_m = list( ltd )
# If you don't want the enumeration indices, you can strip them as usual:
ltd_no_enum = [ v for i,v in ltd_m ]
EDIT: Actually, islice
is a much better option.
limit = 10
limited_artists = [artist in xml.findall('artist')][:limit]
results = [xml_to_dict(artist) for limited_artists]
This avoids the issues of slicing: it doesn't change the order of operations, and doesn't construct a new list, which can matter for large lists if you're filtering the list comprehension.
def first(it, count):
it = iter(it)
for i in xrange(0, count):
yield next(it)
raise StopIteration
print [i for i in first(range(1000), 5)]
It also works properly with generator expressions, where slicing will fall over due to memory use:
exp = (i for i in first(xrange(1000000000), 10000000))
for i in exp:
print i
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.