XML Parsing with ElementTree and Requests

Question

I am trying to work with the Yahoo Weather API, but I am having a few issues parsing the XML that the API responds with. I am using Python 3.4 . Here's the code I am working with:

weather_url = 'http://weather.yahooapis.com/forecastrss?w=%s&u=%s'
url = weather_url % (zip_code, units)

try:
    rss = parse(requests.get(url, stream=True).raw).getroot()

    conditions = rss.find('channel/item/{%s}condition' % weather_ns)

    return {
        'current_condition': conditions.get('text'),
        'current_temp': conditions.get('temp'),
        'title': rss.findtext('channel/title')
    }
except:
    raise

Here's the stack trace that I am getting:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/jonathan/PycharmProjects/pyweather/pyweather/pyweather.py", line 42, in yahoo_conditions
    rss = parse(requests.get(url, stream=True).raw).getroot()
  File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 1187, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
  File "<string>", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

The xml.etree.ElementTree parse function doesn't like the raw object returned by the requests library. Looking into it a little bit deeper, the raw object resolves to

>>> r = requests.get('http://weather.yahooapis.com/forecastrss?w=2502265', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x7f32c24f9e48>

I referenced this solution , but it's still leading to the same issue. Why doesn't the approach above work? Is the urllib3 response object not supported with the ElementTree.parse function? I have read all of the docs, but they haven't enlightened me at all.

The doc list is here:

Edit: After more experimentation, I still haven't found a solution to the problem outlined above. However, I have found a workaround. If you use the ElementTree's fromstring method on the XML content, everything works fine.

def fetch_xml(url):
    """
    Fetch a url and parse the document's XML.

    :param url: the URL that the XML is located at.
    :return: the root element of the XML.
    :raises:
        :requests.exceptions.RequestException: Requests could not open the URL.
        :xml.etree.ElementTree.ParseError: xml.etree.ElementTree failed to parse the XML document.
    """

    return ET.fromstring(requests.get(url).content)

I guess the downside to this approach is that it uses more memory. What do you think? I'd like to get the communities opinion.

Answer 1

Why are you using streaming with requests to download some RSS XML data? Do you want to keep a connection open all the time? Weather hardly changes that quickly, so why not just poll the service every 5 minutes instead?

Below is the complete code for doing a poll and parsing using BeautifulSoup and requests. Short and sweet.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://weather.yahooapis.com/forecastrss?w=%s&u=%s' % (2459115, "c"))
if r.status_code == 200:
    soup = BeautifulSoup(r.text)
    print("Current condition: ", soup.find("description").string)
    print("Temperature: ", soup.find('yweather:condition')['temp'])
    print("Title: ", soup.find("title").string)
else:
    r.raise_for_status()

Output:

Current condition:  Yahoo! Weather for New York, NY
Temperature:  28
Title:  Yahoo! Weather - New York, NY

There is a lot more you can do with Beautifulsoup. Look up its excellent documentation.

Answer 2

If you use the ElementTree's fromstring method on the XML content, everything works fine.

def fetch_xml(url):
    """
    Fetch a url and parse the document's XML.

    :param url: the URL that the XML is located at.
    :return: the root element of the XML.
    :raises:
        :requests.exceptions.RequestException: Requests could not open the URL.
        :xml.etree.ElementTree.ParseError: xml.etree.ElementTree failed to parse the XML document.
    """

    return ET.fromstring(requests.get(url).content)

I guess the downside to this approach is that it uses more memory.

XML Parsing with ElementTree and Requests

Question

2 answers

solution1
1 2014-08-01 18:37:39

solution2
1 ACCPTED 2014-09-06 22:24:42

XML Parsing with ElementTree and Requests

Question

2 answers

solution1 1 2014-08-01 18:37:39

solution2 1 ACCPTED 2014-09-06 22:24:42

solution1
1 2014-08-01 18:37:39

solution2
1 ACCPTED 2014-09-06 22:24:42