简体   繁体   English

Python:从URL到CSV的XML检索

[英]Python: XML retrieve from a URL to CSV

I am trying to write a Python script that dynamically reads the XML data from a URL, (eg http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72 ) 我正在尝试编写一个Python脚本,该脚本可以从URL动态读取XML数据(例如, http : //www.wrh.noaa.gov/mesowest/getobextXml.php? sid = KCQT&num =72

The format of the XML is as follows: XML的格式如下:

<station id="KCQT" name="Los Angeles / USC Campus Downtown" elev="179" lat="34.02355" lon="-118.29122" provider="NWS/FAA">
<ob time="04 Oct 7:10 pm" utime="1507169400">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:05 pm" utime="1507169100">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:00 pm" utime="1507168800">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 6:55 pm" utime="1507168500">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
</station>

I only want to retrieve the timestamp and the decimal temperature ("Temp") for all the dates available (there are more than the 4 I included). 我只想检索所有可用日期的时间戳和十进制温度(“温度”)(超过了我所包括的4个)。

The output should be in a CSV formatted text file where the timestamps and temperature values are printed one pair per line. 输出应为CSV格式的文本文件,其中时间戳记和温度值每行打印一对。

Below is my attempt at the code (which is terrible and did not work at all): 以下是我对代码的尝试(这很糟糕,根本无法使用):

import requests

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")

import xml.etree.ElementTree as ET
import csv

tree = ET.parse(weatherXML)
root = tree.getroot()

# open file for writing
Time_Temp = open('timestamp_temp.csv', 'w')

#csv writer object
csvwriter = csv.writer(Time_Temp)
time_temp = []

count = 0
for member in root.findall('ob'):
    if count == 0:
        temperature = member.find('T').var
        time_temp.append(temperature)
        csvwriter.writerow(time_temp)
        count = count + 1

    temperature = member.find('T').text
    time_temp.append(temperature)

Time_Temp.close()

Please help. 请帮忙。

You can iterate the element ob first, get the attribute time of element ob , and find element variable whose var is T and get the element value for temperature, append them to a list, and write it to csv files: 您可以遍历元素ob第一,获取属性time元素的ob ,并找到元素变量,其varT ,并获得元素value的温度,它们添加到列表,并将其写入到CSV文件:

import xml.etree.ElementTree as ET
import csv
tree = ET.parse('getobextXml.php.xml')
root = tree.getroot()
# open file for writing
with open('timestamp_temp.csv', 'wb') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(["Time","Temp"])
    for ob in root.iter('ob'): 
        time_temp = []
        timestamp = ob.get('time') #get the attribute time of element ob
        temp = ob.find("./variable[@var='T']").get('value') #find element variable which var is T, and get the element value
        time_temp.append(timestamp)
        time_temp.append(temp)
        csvwriter.writerow(time_temp) 

after that you can find timestamp_temp.csv will give you the result: 之后,您可以找到timestamp_temp.csv给您结果:

Time,Temp
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
04 Oct 3:47 pm,75
04 Oct 2:47 pm,77
04 Oct 1:47 pm,78
04 Oct 12:47 pm,78
04 Oct 11:47 am,76
04 Oct 10:47 am,74
04 Oct 9:47 am,72
...

Assuming Python 3, this will work. 假设使用Python 3,这将起作用。 I noted the Python 2 difference if needed: 如果需要,我注意到了Python 2的不同之处:

import xml.etree.ElementTree as ET
import requests
import csv

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")
root = ET.fromstring(weatherXML.text)

# Use this with Python 2
# with open('timestamp_temp.csv','wb') as Time_Temp:

with open('timestamp_temp.csv','w',newline='') as Time_Temp:
    csvwriter = csv.writer(Time_Temp)
    csvwriter.writerow(['Time','Temp'])
    for member in root.iterfind('ob'):
        date = member.attrib['time']
        temp = member.find("variable[@var='T']").attrib['value']
        csvwriter.writerow([date,temp])

Output: 输出:

Time,Temp
04 Oct 11:47 pm,65
04 Oct 10:47 pm,66
04 Oct 9:47 pm,68
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
   .
   .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM