How to extract text from lxml.etree tags based on value of sibling tags

Question

My objective is to pull urls from an xml document (linked) and put them in a list: https://www.valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml

I imported etree from lxml and created a list comprehension that pulls the text from all <instanceUrl> tags.

url = 'https://valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml' 
et = etree.fromstring(urlopen(url).read())
return [_.find('instanceUrl').text for _ in et.find('filings')]

Now, I want to restrict the list so that it only pulls the text from <instanceUrl> tags where <formType> =10K.

How can I achieve this?

Answer 1

You need an XPath expression and the xpath() method :

[url.text for url in et.xpath('//filing[formType = "10-K"]/instanceUrl')]

Here we are filtering the filing nodes that contain formType child nodes with 10-K text, then getting the instanceUrl child.

Note that the _ variable name is used for throw-away variables - variables that have to be defined but not actually used (eg during unpacking). In your case, you've actually used it.

How to extract text from lxml.etree tags based on value of sibling tags

Question

1 answers

solution1
2 ACCPTED 2017-01-18 23:00:37

How to extract text from lxml.etree tags based on value of sibling tags

Question

1 answers

solution1 2 ACCPTED 2017-01-18 23:00:37

solution1
2 ACCPTED 2017-01-18 23:00:37