Trying to solve how to concatenate strings within a <br>
is not working.
Here is the code:
<li class="attr">
<span>
Size:L
<br>
Color:RED
</span>
</li>
I tried using these but is not working:
color_and_size = row.xpath('.//li[@class="attr"][1]/span[1]/text()')[0]
It seems your xml structure is corrupt since no closing </br>
tag- So if you use lxml
then try soupparser that use Beautifulsoup- Or you can use standalone Beutifulsoup as below-
from bs4 import BeautifulSoup
s = """<li class="attr">
<span>
Size:L
<br>
Color:RED
</span>
</li>
"""
soup = BeautifulSoup(s)
print map(lambda x: x.text.strip().replace("\n",""),soup.find_all('span'))
Prints-
[u'Size:L Color:RED']
NB Beautifulsoup organises xml internally eg if you want valid xml of your malformed xml then try-
print soup.prettify()
Prints-
<html>
<body>
<li class="attr">
<span>
Size:L
<br/>
Color:RED
</span>
</li>
</body>
</html>
If your xml
was valid the below xpath
would work-
//li[@class='attr']/span/text()[preceding-sibling::br or following-sibling::br]
Live Demo Just click the Test
button
You can combine Python string methods with lxml
's XPath return values:
>>> import lxml.html
>>> text = '''<html>
... <li class="attr">
... <span>
... Size:L
... <br>
... Color:RED
... </span>
... </li>
... </html>'''
>>> doc = lxml.html.fromstring(text)
>>>
>>> # text nodes can contain leading and trailing whitespace characters
>>> doc.xpath('.//li[@class="attr"]/span[1]/text()')
['\n Size:L\n ', '\n Color:RED\n ']
>>>
>>> # you can use Python's strip() method
>>> [t.strip() for t in doc.xpath('.//li[@class="attr"]/span[1]/text()')]
['Size:L', 'Color:RED']
You can also test the <span>
if it contains a <br>
: ( span[br]
instead of span[1]
)
>>> doc.xpath('.//li[@class="attr"]/span[br]/text()')
['\n Size:L\n ', '\n Color:RED\n ']
>>> [t.strip() for t in doc.xpath('.//li[@class="attr"]/span[br]/text()')]
['Size:L', 'Color:RED']
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.