[英]How to parse with xml.etree? Python
Python 3.5
看代碼
import urllib.request
from xml.etree import ElementTree as ET
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
def conectar(url):
page = urllib.request.urlopen(url)
return page.read()
root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")
需要提取'21/'
,但返回此錯誤:
Erro:
Traceback (most recent call last):
File "crawler.py", line 11, in <module>
root = ET.fromstring(conectar(url))
File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8
但是我不知道如何解決這個錯誤。
當您嘗試解析的文檔聲稱是xhtml時,由於未綁定前綴,它是無效的xml。
<gcse:search></gcse:search>
未為文檔定義gcse
ns前綴。
BeautifulSoup可能會更適合您要嘗試執行的操作,因為對於文檔100%有效的說法並不挑剔。
您可以從以下內容開始:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')
tables = dom.find_all("table")
if len(tables):
table = tables[0]
print table
(在python 2.7中測試)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.