简体   繁体   English

尝试使用 PY3 从 XML 中提取数据时出现 xml.etree.ElementTree.ParseError 问题

[英]xml.etree.ElementTree.ParseError issue when trying to extract data from XML using PY3

I am having an issue trying to extract the email from a xml file using Python3.我在尝试使用 Python3 从 xml 文件中提取电子邮件时遇到问题。

My code is:我的代码是:

import xml.etree.ElementTree as ET
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

data = '''<row>
    <row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
        <codice_regionale>MI1604</codice_regionale>
        <denom_farmacia>Farmacia Varesina</denom_farmacia>
        <indirizzo>VIA VARESINA, 121</indirizzo>
        <localita>Milano</localita>
        <telefono>3480813398</telefono>
        <email>silvana.toschi@gmail.com</email>
        <caratterizzazione>urbana</caratterizzazione>
        <esenzioni>true</esenzioni>
        <location latitude="45.500881" longitude="9.141339"/>
</row>'''

tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml

print(results.text)

The error I get is我得到的错误

Traceback (most recent call last):
  File "farmacie.py", line 25, in <module>
    tree = ET.fromstring(data) #standard ET
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 12, column 6

How can I solve this?我该如何解决这个问题?

So it looks like you have the row element defined twice (or you are missing the extra end tag), which is causing one issue.所以看起来你已经定义了两次行元素(或者你缺少额外的结束标记),这导致了一个问题。 The next is that findall() will return a list, so you would need to pick one, or print them all out:接下来是findall()将返回一个列表,因此您需要选择一个,或者将它们全部打印出来:

import xml.etree.ElementTree as ET

data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
        <codice_regionale>MI1604</codice_regionale>
        <denom_farmacia>Farmacia Varesina</denom_farmacia>
        <indirizzo>VIA VARESINA, 121</indirizzo>
        <localita>Milano</localita>
        <telefono>3480813398</telefono>
        <email>silvana.toschi@gmail.com</email>
        <caratterizzazione>urbana</caratterizzazione>
        <esenzioni>true</esenzioni>
        <location latitude="45.500881" longitude="9.141339"/>
</row>'''

tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml

print(results[0].text)

Or:或者:

for r in results:
    print(r.text)

Update :更新

After getting the full dataset , the correct way to get all of the emails would be:获取完整数据集后,获取所有电子邮件的正确方法是:

import xml.etree.ElementTree as ET
import requests

data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content

tree = ET.fromstring(data)
results = tree.findall("./row/row/email")

for r in results:
    print(r.text)

Results (2,684 rows):结果(2,684 行):

silvana.toschi@gmail.com
farmacia.manelli@hotmail.com
badobruno@hotmail.com
giovannibrambilla@msn.com
...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试使用python解析xml时出错:xml.etree.ElementTree.ParseError:语法错误:第1行 - Error trying parsing xml using python : xml.etree.ElementTree.ParseError: syntax error: line 1, 无法在 Python 中解析 XML 文件 - xml.etree.ElementTree.ParseError - Cannot parse XML files in Python - xml.etree.ElementTree.ParseError xml.etree.ElementTree.ParseError: unbound prefix: 如何在不对 xml 文件进行任何更改的情况下解决此问题 - xml.etree.ElementTree.ParseError: unbound prefix: How to solve this issue without making any change to xml file xml.etree.ElementTree.ParseError:文档元素后出现垃圾 - xml.etree.ElementTree.ParseError: junk after document element xml.etree.ElementTree.ParseError -- 异常处理未捕获错误 - xml.etree.ElementTree.ParseError -- exception handling not catching errors 修复 xml.etree.ElementTree.ParseError: undefined entity è - Fixing xml.etree.ElementTree.ParseError: undefined entity &egrave 'xml.etree.ElementTree.ParseError: no element found' 制作 python class 时 - 'xml.etree.ElementTree.ParseError: no element found' when making python class xml python prasing 3错误(xml.etree.ElementTree.ParseError:格式不正确(无效令牌):第1行,第2列) - xml prasing in python 3 errors (xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 2) xml.etree.ElementTree.ParseError:格式不正确(无效令牌):第104行,第109列 - xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 104, column 109 python - xml.etree.ElementTree.ParseError:格式不正确(无效令牌) - python - xml.etree.ElementTree.ParseError: not well-formed (invalid token)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM