美麗的湯HTML提取

Question

我正在努力獲取我想要的數據，如果你知道如何使用BS，我確信它非常簡單。 在閱讀完文檔后，我一直試圖將這個問題弄好幾個小時。

目前我的代碼在python中輸出：

[<td>0.32%</td>, <td><span class="neg color ">&gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, <td><span class="neu">0.00</span></td>]

我如何才能隔離不包含標簽的td標簽的內容？

即我只想看0.32％，0.29％，0.38％。

謝謝。

import urllib2
from bs4 import BeautifulSoup

fturl = 'http://markets.ft.com/research/Markets/Bonds'
ftcontent = urllib2.urlopen(fturl).read()
soup = BeautifulSoup(ftcontent)

ftdata = soup.find(name="div", attrs={'class':'wsodModuleContent'}).find_all(name="td",       attrs={'class':''})

Answer 1

這是你的好解決方案：

html_txt = """<td>0.32%</td>, <td><span class="neg color">
    &gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, 
    <td><span class="neu">0.00</span></td>
    """
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_txt)
print [tag.text for tag in soup.find_all('td') if tag.text.strip().endswith("%")]

輸出是：

[u'0.32%', u'0.29%', u'0.38%']

美麗的湯HTML提取

問題描述

1 個解決方案

解決方案1
2 已采納 2013-05-24 10:16:35

美麗的湯HTML提取

問題描述

1 個解決方案

解決方案1 2 已采納 2013-05-24 10:16:35

解決方案1
2 已采納 2013-05-24 10:16:35