I try to extract several forum posts by using the standard XPath method:
response.xpath('.//div[contains(@class, "Message userContent")]')
That one returns a complete list of comments as wished.
But once I include //text()
or string(...)
the length of the list jumps up to 100 or 150 items, which makes it impossible to grasp or to iterate over the list and join it with other data like author or the date...
normalize-space(...)
only returns the first comment.
It has to do something with all the new lines and breaks in the html code but at this stage I have no idea how to handle these.
Would string-join(...[normalize-space()])
be an option here?
Realize what each XPath is selecting:
.//div[contains(@class, "Message userContent")]
selects div
elements. .//div[contains(@class, "Message userContent")]//text()
selects all text node descendants of those div
elements. normalize-space(.//div[contains(@class, "Message userContent")])
in XPath 1.0 takes the space-normalized string value of the first such div
element. normalize-space(.//div[contains(@class, "Message userContent")])
in XPath 2.0 is a runtime error when normalize-space()
is passed a sequence. If you want to get the string values of each such div
:
div
elements in the hosting language and separately take the string value. /string()
to the XPath.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.