簡體   English   中英

如何用xml.etree解析? 蟒蛇

[英]How to parse with xml.etree? Python

Python 3.5

看代碼

import urllib.request
from xml.etree import ElementTree as ET

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'


def conectar(url):
    page = urllib.request.urlopen(url)
    return page.read()

root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")

需要提取'21/' ,但返回此錯誤:

Erro:

Traceback (most recent call last):
  File "crawler.py", line 11, in <module>
    root = ET.fromstring(conectar(url))
  File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8

但是我不知道如何解決這個錯誤。

當您嘗試解析的文檔聲稱是xhtml時,由於未綁定前綴,它是無效的xml。

<gcse:search></gcse:search>

未為文檔定義gcse ns前綴。

BeautifulSoup可能會更適合您要嘗試執行的操作,因為對於文檔100%有效的說法並不挑剔。

您可以從以下內容開始:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')

tables = dom.find_all("table")
if len(tables):
    table = tables[0]
    print table

(在python 2.7中測試)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM