简体   繁体   中英

XML Parsing to .txt file Python

I need to parse this XML Document and move the date and time into a %Y-%m-%d %H:%M:%S format as well as the variables hourly-qpf and probability-of-precipitation to columns in a tab-delimited .txt file.

All I have managed to do is read in the XML file using this code:

page = urllib2.urlopen('http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML')
page_content = page.read()
with open('KBFI.xml', 'w') as fid:
    fid.write(page_content)

I am at a loss after this. I've only parsed one XML doc before, and it looked completely different from this.

EDIT

Sorry for not having anything to give you guys before, but I wasn't sure what module to use, as I only have experience with minidom and it didn't seem like the right choice. I've been messing around with Element Tree and I have come up with this:

data = []
import xml.etree.ElementTree as ET
tree = ET.parse('KBFI.xml')
root = tree.getroot()
for data in root.findall('data'):
    for time-layout in root.findall('time-layout'):
        start-valid-time = time-layout.find('start-valid-time')
        time = datetime.datetime.strptime(start-valid-time, '%Y-%m-%dT%H:%M:%S')
    for parameters in root.findall('parameters'):
        for probability-of-precipitation in root.findall('probability-of-precipitation'):
            value = probability-of-precipitation.find('value')
    for hourly-qpf in root.findall('hourly-qpf'):
            value2 = hourly-qpf.find('value')
data = data.append([time,
                    value,
                    value2])
with open('KBFI.txt','w') as file:
    file.writelines('\t'.join(map(str,i)) + '\n' for i in data)

However, there is a problem because the variables are hyphenated and I do not know how to change them to underscores or remove them. Also, because of this, I have no idea if my code is any good!

You can use the python xml lib:

https://docs.python.org/2/library/xml.etree.elementtree.html

import urllib2
import xml.etree.ElementTree as ET
page = urllib2.urlopen('http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML')
page_content = page.read()
root = ET.fromstring(page_content)
for _f in root.itertext():
    ***Do your formatting here***

I suggest using xmltodict for parsing and extracting data from XML because it is straightforward and easy to use since it converts XML to Python dicts with the same nesting as the XML source. For those familiar with Python syntax, using it is natural and Python dicts are fully versatile, meaning they are capable of expressing heterogeneous and nested data stuctures. For example the Pickling Tools Library relies on Python dicts for Python, C++ and Java data interoperability and provides tools for converting XML to dict. Advantages of xmltodict are that its small, fast, and a standalone module just for converting XML to dict.

As an example of xmltodict usage, the following script downloads this XML document and extracts its creation-date and lists of probability-of-precipitation and hourly-qpf values:

import requests
url='http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML'
r = requests.get(url)

import xmltodict
result = xmltodict.parse(r.text)  
cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)
pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)

Here is the output from running this script (on 20150730):

creation-date = 2015-07-30T08:53:12-07:00

probability-of-precipitation

hourly-qpf

xmltodict can be installed with 'pip install xmltodict'. It was developed by Martin Blech and its GitHub project is at https://github.com/martinblech/xmltodict .

In order to access start-valid-time and end-valid-time its helps to know their data structures as well their locations. Since both are a series of values enclosed in identical tags, intuitively each series should be formed in a separate list as the value of a key with their name similar to probability-of-precipitation and hourly-qpf. This can be confirmed by printing the entire result dict and inspecting the format of start-valid-time and end-valid-time in it and that can be facilitated by pretty printing the result dict (with import pprint and then running pprint.pprint(result)). For this XML document , pretty printing its equivilant dict generates over 2000 lines, however start-valid-time begins on line 26 and its value is clearly an list:

{'dwml': {'@version': '1.0',
          '@xmlns:xsd': 'http://www.w3.org/2001/XMLSchema',
          '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
          '@xsi:noNamespaceSchemaLocation': 'http://graphical.weather.gov/xml/DWMLgen/schema/DWML.xsd',
          'head': {'product': {'@concise-name': 'tabular-digital',
                               '@operational-mode': 'developmental',
                               '@srsName': 'WGS 1984',
                               'creation-date': {'@refresh-frequency': 'PT1H',
                                                 '#text': '2015-07-31T14:20:30-07:00'}},
                   'source': {'production-center': 'Seattle, WA',
                              'credit': 'http://www.wrh.noaa.gov/sew',
                              'more-information': 'http://www.nws.noaa.gov/forecasts/xml/'}},
          'data': {'location': {'location-key': 'point1',
                                'description': 'Downtown Seattle WA, WA',
                                'point': {'@latitude': '47.61',
                                          '@longitude': '-122.32'},
                                'city': {'@state': 'WA',
                                         '#text': 'Downtown Seattle WA'},
                                'height': {'@datum': 'mean sea level',
                                           '#text': '240'}},
                   'moreWeatherInformation': {'@applicable-location': 'point1',
                                              '#text': 'http://forecast.weather.gov/MapClick.php?lat=47.61&lon=-122.32&FcstType=digital'},
                   'time-layout': {'@time-coordinate': 'local',
                                   '@summarization': 'none',
                                   'layout-key': 'k-p1h-n1-0',
                                   'start-valid-time': ['2015-07-31T16:00:00-07:00',
                                                        '2015-07-31T17:00:00-07:00',
                                                        '2015-07-31T18:00:00-07:00',
...

Here is a script that extracts and prints creation-date as a scalar value, all start-valid-time values in a list, all end-valid-time values in a list, all probability-of-precipitation values in a list and all hourly-qpf values in a list and prints the length of each extracted list:

import xmltodict
result = xmltodict.parse(r.text)

cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)

svt = result['dwml']['data']['time-layout']['start-valid-time']
print("\nstart-valid-time =", svt)
print("number of start-valid-time entries =", len(svt))

evt = result['dwml']['data']['time-layout']['end-valid-time']
print("\nend-valid-time =", evt)
print("number of end-valid-time entries =", len(evt))

pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
print("number of probability-of-precipitation entries =", len(pop))

hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)
print("number of hourly-qpf entries =", len(hqpf))

Here is the output from running this script (on 20150731):

creation-date = 2015-07-31T14:20:30-07:00

start-valid-time = ['2015-07-31T16:00:00-07:00', '2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00']
number of start-valid-time entries = 168

end-valid-time = ['2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00', '2015-08-07T16:00:00-07:00']
number of end-valid-time entries = 168

probability-of-precipitation = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34']
number of probability-of-precipitation entries = 168

hourly-qpf
number of hourly-qpf entries = 168

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM