[英]xml.etree.ElementTree.ParseError issue when trying to extract data from XML using PY3
I am having an issue trying to extract the email from a xml file using Python3.我在尝试使用 Python3 从 xml 文件中提取电子邮件时遇到问题。
My code is:我的代码是:
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
data = '''<row>
<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results.text)
The error I get is我得到的错误是
Traceback (most recent call last):
File "farmacie.py", line 25, in <module>
tree = ET.fromstring(data) #standard ET
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 12, column 6
How can I solve this?我该如何解决这个问题?
So it looks like you have the row element defined twice (or you are missing the extra end tag), which is causing one issue.所以看起来你已经定义了两次行元素(或者你缺少额外的结束标记),这导致了一个问题。 The next is that
findall()
will return a list, so you would need to pick one, or print them all out:接下来是
findall()
将返回一个列表,因此您需要选择一个,或者将它们全部打印出来:
import xml.etree.ElementTree as ET
data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results[0].text)
Or:或者:
for r in results:
print(r.text)
Update :更新:
After getting the full dataset , the correct way to get all of the emails would be:获取完整数据集后,获取所有电子邮件的正确方法是:
import xml.etree.ElementTree as ET
import requests
data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content
tree = ET.fromstring(data)
results = tree.findall("./row/row/email")
for r in results:
print(r.text)
Results (2,684 rows):结果(2,684 行):
silvana.toschi@gmail.com
farmacia.manelli@hotmail.com
badobruno@hotmail.com
giovannibrambilla@msn.com
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.