[英]Python: XML retrieve from a URL to CSV
I am trying to write a Python script that dynamically reads the XML data from a URL, (eg http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72 ) 我正在尝试编写一个Python脚本,该脚本可以从URL动态读取XML数据(例如, http : //www.wrh.noaa.gov/mesowest/getobextXml.php? sid = KCQT&num =72 )
The format of the XML is as follows: XML的格式如下:
<station id="KCQT" name="Los Angeles / USC Campus Downtown" elev="179" lat="34.02355" lon="-118.29122" provider="NWS/FAA">
<ob time="04 Oct 7:10 pm" utime="1507169400">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:05 pm" utime="1507169100">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:00 pm" utime="1507168800">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 6:55 pm" utime="1507168500">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
</station>
I only want to retrieve the timestamp and the decimal temperature ("Temp") for all the dates available (there are more than the 4 I included). 我只想检索所有可用日期的时间戳和十进制温度(“温度”)(超过了我所包括的4个)。
The output should be in a CSV formatted text file where the timestamps and temperature values are printed one pair per line. 输出应为CSV格式的文本文件,其中时间戳记和温度值每行打印一对。
Below is my attempt at the code (which is terrible and did not work at all): 以下是我对代码的尝试(这很糟糕,根本无法使用):
import requests
weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")
import xml.etree.ElementTree as ET
import csv
tree = ET.parse(weatherXML)
root = tree.getroot()
# open file for writing
Time_Temp = open('timestamp_temp.csv', 'w')
#csv writer object
csvwriter = csv.writer(Time_Temp)
time_temp = []
count = 0
for member in root.findall('ob'):
if count == 0:
temperature = member.find('T').var
time_temp.append(temperature)
csvwriter.writerow(time_temp)
count = count + 1
temperature = member.find('T').text
time_temp.append(temperature)
Time_Temp.close()
Please help. 请帮忙。
You can iterate the element ob
first, get the attribute time
of element ob
, and find element variable whose var
is T
and get the element value
for temperature, append them to a list, and write it to csv files: 您可以遍历元素
ob
第一,获取属性time
元素的ob
,并找到元素变量,其var
是T
,并获得元素value
的温度,它们添加到列表,并将其写入到CSV文件:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('getobextXml.php.xml')
root = tree.getroot()
# open file for writing
with open('timestamp_temp.csv', 'wb') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(["Time","Temp"])
for ob in root.iter('ob'):
time_temp = []
timestamp = ob.get('time') #get the attribute time of element ob
temp = ob.find("./variable[@var='T']").get('value') #find element variable which var is T, and get the element value
time_temp.append(timestamp)
time_temp.append(temp)
csvwriter.writerow(time_temp)
after that you can find timestamp_temp.csv
will give you the result: 之后,您可以找到
timestamp_temp.csv
给您结果:
Time,Temp
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
04 Oct 3:47 pm,75
04 Oct 2:47 pm,77
04 Oct 1:47 pm,78
04 Oct 12:47 pm,78
04 Oct 11:47 am,76
04 Oct 10:47 am,74
04 Oct 9:47 am,72
...
Assuming Python 3, this will work. 假设使用Python 3,这将起作用。 I noted the Python 2 difference if needed:
如果需要,我注意到了Python 2的不同之处:
import xml.etree.ElementTree as ET
import requests
import csv
weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")
root = ET.fromstring(weatherXML.text)
# Use this with Python 2
# with open('timestamp_temp.csv','wb') as Time_Temp:
with open('timestamp_temp.csv','w',newline='') as Time_Temp:
csvwriter = csv.writer(Time_Temp)
csvwriter.writerow(['Time','Temp'])
for member in root.iterfind('ob'):
date = member.attrib['time']
temp = member.find("variable[@var='T']").attrib['value']
csvwriter.writerow([date,temp])
Output: 输出:
Time,Temp
04 Oct 11:47 pm,65
04 Oct 10:47 pm,66
04 Oct 9:47 pm,68
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
.
.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.