简体   繁体   English

xpath接受所有文本,而不仅仅是第一行

[英]xpath take all the text, not just the first line

I have this html 我有这个HTML

    <td colspan="2" align="justify" class="inPage">
                <p>
                    2 bedroom + maids +balcony in Tiara Residence - Diamond type
                    <br>1700 sq.ft, furnished with kitchen equipment
                    <br>Sea view/ Atlantis view
                    <br>Selling Price: 4 million
                </p>
    </td>

My xpath is: 我的xpath是:

normalize-space(.//div[@class='section']/table/tr[7]/td/p/text())

The result is just 2 bedroom + maids +balcony in Tiara Residence - Diamond type 结果是2 bedroom + maids +balcony in Tiara Residence - Diamond type

I need the other text inside the p tag. 我需要p标记内的其他文本。

I am using scrapy 0.20 with python 0.27 我正在使用python 0.27的scrapy 0.20

You can simply use 您可以简单地使用

normalize-space(.//div[@class='section']/table/tr[7]/td/p)

but this concatenate al text nodes, without any newline characters. 但这连接了所有文本节点,没有任何换行符。

normalize-space() , as with other XPath string functions that expect a string argument, will convert the input node p to it's string-value . 与其他需要字符串参数的XPath字符串函数一样, normalize-space()会将输入节点p转换为其string-value Quoting XPath 1.0 specifications : 引用XPath 1.0规范

For every type of node, there is a way of determining a string-value for a node of that type. 对于每种类型的节点,都有一种方法可以确定该类型节点的字符串值。 For some types of node, the string-value is part of the node; 对于某些类型的节点,字符串值是该节点的一部分; for other types of node, the string-value is computed from the string-value of descendant nodes 对于其他类型的节点,从后代节点的字符串值计算出字符串值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM