如何使用lxml从html解析文本？

Question

<p>
    Glassware veteran
    <strong>Corning </strong>
    (
    <span class="ticker">
      NYSE:
      <a class="qsAdd qs-source-isssitthv0000001" href="http://caps.fool.com/Ticker/GLW.aspx?source=isssitthv0000001" data-id="203758">GLW</a>
    </span>
    <a class="addToWatchListIcon qsAdd qs-source-iwlsitbut0000010" href="http://my.fool.com/watchlist/add?ticker=&source=iwlsitbut0000010" title="Add to My Watchlist"> </a>
    ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback?
</p>

I want to get "Glassware veteran" and "has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback?" 我想成为“玻璃器皿老手”，并且“最近陷入困境。是时候该放弃股票了，还是康宁要香蕉和卷土重来？”

Using the code 使用代码

tnode = root.xpath("/p")
content = tnode.text

I can only get "Glassware veteran",why? 我只能得到“玻璃器皿老手”，为什么？

Answer 1

Something like this might get you what you want: 这样的事情可能会为您提供所需的东西：

>>> tnode = root.xpath('/p')
>>> content = tnode.xpath('text()')
>>> print ''.join(content)

Glassware veteran

(


) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback?
>>>

If you want all of the text nodes, just use //text() instead of text() : 如果要使用所有文本节点，只需使用//text()而不是text() ：

>>> print ' '.join([x.strip() for x in ele.xpath('//text()')])
Glassware veteran Corning ( NYSE: GLW    ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback?

如何使用lxml从html解析文本？

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-12-06 15:13:18

如何使用lxml从html解析文本？

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-12-06 15:13:18

解决方案1
0 已采纳 2012-12-06 15:13:18