如何在python XPath中连接br中的字符串？

Question

Trying to solve how to concatenate strings within a   is not working. 试图解决如何在一个 连接字符串是行不通的。

Here is the code: 这是代码：

<li class="attr">
    <span>
        Size:L
        <br>
        Color:RED
    </span>
</li>

I tried using these but is not working: 我试过使用这些但是没有用：

color_and_size = row.xpath('.//li[@class="attr"][1]/span[1]/text()')[0]

Answer 1

It seems your xml structure is corrupt since no closing  tag- So if you use lxml then try soupparser that use Beautifulsoup- Or you can use standalone Beutifulsoup as below- 看来你的xml结构已损坏，因为没有关闭标签 - 所以如果你使用lxml然后尝试使用Beautifulsoup的soupparser-或者你可以使用如下的独立Beutifulsoup-

from bs4 import BeautifulSoup
s = """<li class="attr">
    <span>
        Size:L
        <br>
        Color:RED
    </span>
</li>
"""
soup = BeautifulSoup(s)

print map(lambda x: x.text.strip().replace("\n",""),soup.find_all('span'))

Prints- Prints-

[u'Size:L                Color:RED']

NB Beautifulsoup organises xml internally eg if you want valid xml of your malformed xml then try- NB Beautifulsoup在内部组织xml，例如，如果你想要有效xml的畸形xml，那么试试 -

print soup.prettify()

Prints- Prints-

<html>
 <body>
  <li class="attr">
   <span>
    Size:L
    <br/>
    Color:RED
   </span>
  </li>
 </body>
</html>

If your xml was valid the below xpath would work- 如果您的xml有效，则以下xpath将起作用 -

//li[@class='attr']/span/text()[preceding-sibling::br or following-sibling::br]

Live Demo Just click the Test button 现场演示 只需单击“ Test 按钮

Answer 2

You can combine Python string methods with lxml 's XPath return values: 您可以将Python字符串方法与lxml的XPath返回值结合使用：

>>> import lxml.html
>>> text = '''<html>
... <li class="attr">
...     <span>
...         Size:L
...         <br>
...         Color:RED
...     </span>
... </li>
... </html>'''
>>> doc = lxml.html.fromstring(text)
>>>
>>> # text nodes can contain leading and trailing whitespace characters
>>> doc.xpath('.//li[@class="attr"]/span[1]/text()')
['\n        Size:L\n        ', '\n        Color:RED\n    ']
>>> 
>>> # you can use Python's strip() method
>>> [t.strip() for t in doc.xpath('.//li[@class="attr"]/span[1]/text()')]
['Size:L', 'Color:RED']

You can also test the  if it contains a   : ( span[br] instead of span[1] ) 您还可以测试如果它包含  :( span[br]而不是span[1] ）

>>> doc.xpath('.//li[@class="attr"]/span[br]/text()')
['\n        Size:L\n        ', '\n        Color:RED\n    ']
>>> [t.strip() for t in doc.xpath('.//li[@class="attr"]/span[br]/text()')]
['Size:L', 'Color:RED']
>>>

如何在python XPath中连接br中的字符串？

问题描述

2 个解决方案

解决方案1
1 2015-11-20 12:10:50

解决方案2
1 2015-11-20 13:35:17

如何在python XPath中连接br中的字符串？

问题描述

2 个解决方案

解决方案1 1 2015-11-20 12:10:50

解决方案2 1 2015-11-20 13:35:17

解决方案1
1 2015-11-20 12:10:50

解决方案2
1 2015-11-20 13:35:17