简体繁体中英

How to get text of selected elements in XPath?

原文 2018-06-08 15:09:57 8 1 xml/ xpath/ web-scraping/ scrapy

I try to extract several forum posts by using the standard XPath method:

response.xpath('.//div[contains(@class, "Message userContent")]')

That one returns a complete list of comments as wished.

But once I include //text() or string(...) the length of the list jumps up to 100 or 150 items, which makes it impossible to grasp or to iterate over the list and join it with other data like author or the date...

normalize-space(...) only returns the first comment.

It has to do something with all the new lines and breaks in the html code but at this stage I have no idea how to handle these.

Would string-join(...[normalize-space()]) be an option here?

1 answers

Realize what each XPath is selecting:

.//div[contains(@class, "Message userContent")] selects div elements.
.//div[contains(@class, "Message userContent")]//text() selects all text node descendants of those div elements.
normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 1.0 takes the space-normalized string value of the first such div element.
normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 2.0 is a runtime error when normalize-space() is passed a sequence.

If you want to get the string values of each such div :

XPath 1.0: Iterate over the selected div elements in the hosting language and separately take the string value.
XPath 2.0: Append /string() to the XPath.

How to get the number of elements in XPath

using XPath: how to exclude text in nested elements

Xpath and concat elements and text

Keep elements in text with XPath

XPATH - how to get all text() elements of any node type having a particular attribute

How to get elements of html,xml by hpple ( xpath )?

XPath to get all child nodes (elements, comments, and text) without parent

How to get text based off 'Minimal Xpath'?

How to get text of XML tag using XPATH

Xpath how to get all text in the tag

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to get the number of elements in XPath using XPath: how to exclude text in nested elements Xpath and concat elements and text Keep elements in text with XPath XPATH - how to get all text() elements of any node type having a particular attribute How to get elements of html,xml by hpple ( xpath )? XPath to get all child nodes (elements, comments, and text) without parent How to get text based off 'Minimal Xpath'? How to get text of XML tag using XPATH Xpath how to get all text in the tag

Related Tags

How to get text of selected elements in XPath?

Question

1 answers

solution1 1 2018-06-08 15:36:26

solution1
1 2018-06-08 15:36:26