如何用xml.etree解析？蟒蛇

Question

Python 3.5

看代碼

import urllib.request
from xml.etree import ElementTree as ET

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'


def conectar(url):
    page = urllib.request.urlopen(url)
    return page.read()

root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")

需要提取'21/' ，但返回此錯誤：

Erro:

Traceback (most recent call last):
  File "crawler.py", line 11, in <module>
    root = ET.fromstring(conectar(url))
  File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8

但是我不知道如何解決這個錯誤。

Answer 1

當您嘗試解析的文檔聲稱是xhtml時，由於未綁定前綴，它是無效的xml。

<gcse:search></gcse:search>

未為文檔定義gcse ns前綴。

BeautifulSoup可能會更適合您要嘗試執行的操作，因為對於文檔100％有效的說法並不挑剔。

Answer 2

您可以從以下內容開始：

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')

tables = dom.find_all("table")
if len(tables):
    table = tables[0]
    print table

（在python 2.7中測試）

如何用xml.etree解析？蟒蛇

問題描述

2 個解決方案

解決方案1
1 2015-12-22 15:01:15

解決方案2
1 已采納 2015-12-23 22:50:34

如何用xml.etree解析？ 蟒蛇

問題描述

2 個解決方案

解決方案1 1 2015-12-22 15:01:15

解決方案2 1 已采納 2015-12-23 22:50:34

如何用xml.etree解析？蟒蛇

解決方案1
1 2015-12-22 15:01:15

解決方案2
1 已采納 2015-12-23 22:50:34