[英]How to filter the text under the strong tag?
我有这个代码:
url = 'http://www.topsoftzone.com/program/12721/Windows_Phone_7.html'
pageurl = urllib.urlopen(url)
soup = BeautifulSoup
print soup.find('table',{'class':'download_tab'}).find('td',{'width':'55%'}).find('strong').text
我应该得到这样的输出:2011/09/29(提交:2011/08/09)
但是代码输出:更新:
我想你在table
和td
之间缺少tr
的表行
无论如何,请考虑将lxml与xpath结合使用
from lxml import etree
tree = etree.parse(url, etree.HTMLParser())
l = tree.xpath('//table[@class="download_tab"]/tr/td[@width="55%"]/text()')
print l[1]
09/29/2011 (Submitted: 09/08/2011)
编辑:没有要求的lxml
soup = BeautifulSoup(pageurl)
l = soup.find('table',{'class':'download_tab'}).find('tr').find('td',{'width':'55%'}).findAll(text=True)
print l[2]
09/29/2011 (Submitted: 09/08/2011)
您需要更多的错误检查,但这可行
import lxml.html
import urllib
import sys
link = "http://www.topsoftzone.com/program/12721/Windows_Phone_7.html"
page = urllib.urlopen(link).read()
doc = lxml.html.document_fromstring(page)
doc.make_links_absolute(link)
found_text = doc.xpath(u".//table[@class='download_tab']/tr/td[@width='55%']/text()")
try:
print found_text[1].strip()
except:
print "Not found"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.