简体   繁体   中英

How to get text of selected elements in XPath?

I try to extract several forum posts by using the standard XPath method:

response.xpath('.//div[contains(@class, "Message userContent")]')

That one returns a complete list of comments as wished.

But once I include //text() or string(...) the length of the list jumps up to 100 or 150 items, which makes it impossible to grasp or to iterate over the list and join it with other data like author or the date...

normalize-space(...) only returns the first comment.

It has to do something with all the new lines and breaks in the html code but at this stage I have no idea how to handle these.

Would string-join(...[normalize-space()]) be an option here?

Realize what each XPath is selecting:

  1. .//div[contains(@class, "Message userContent")] selects div elements.
  2. .//div[contains(@class, "Message userContent")]//text() selects all text node descendants of those div elements.
  3. normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 1.0 takes the space-normalized string value of the first such div element.
  4. normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 2.0 is a runtime error when normalize-space() is passed a sequence.

If you want to get the string values of each such div :

  • XPath 1.0: Iterate over the selected div elements in the hosting language and separately take the string value.
  • XPath 2.0: Append /string() to the XPath.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM