BeautifulSoup4 not recognizing xml tag

Question

I'm using BeautifulSoup4 (with lxml parser) to parse xml that looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<metadata id="8735180"  name="Dauphin Island" lat="30.2500" lon="-88.0750"/>
<observations>
<wl t="2013-12-14 00:00"  v="0.725" s="0.059" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:06"  v="0.771" s="0.066" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:12"  v="0.764" s="0.085" f="0,0,0,0" q="v" />

....etc

The python code is like so:

obs_soup = BeautifulSoup(urllib2.urlopen('http://tidesandcurrents.noaa.gov/api/datagetter?product=water_level&application=NOS.COOPS.TAC.WL&begin_date=20131214&end_date=20131216&datum=MSL&station=8735180&time_zone=GMT&units=english&interval=&format=xml),'lxml')

for l in obs_soup.findall('wl'):
    obs.append(l['v'])

I keep getting the error:

for l in obs_soup.findall('wl'):
TypeError: 'NoneType' object is not callable

I tried the solution here (except instead of looking for 'html', I looked for 'data'), but that didn't work. Any suggestions?

Answer 1

There are two problems here.

First, there is no such method as findall in BeautifulSoup . Change that to:

for l in obs_soup.find_all('wl'):
    obs.append(l['v'])

… and it will work.

So, why are you getting this TypeError: 'NoneType' object is not callable instead of the more usual AttributeError ? Because of BeautifulSoup's magic lookup—the same thing that lets you do obs_soup.wl as a shortcut for finding a <wl> also lets you do obs_soup.findall as a shortcut for finding a <findall> . Because there is no <findall> node, it returns None . And then you're trying to call that None object as a function, which of course is nonsense.

Also, if you actually had copied and pasted the copy from here as you claimed, you wouldn't have had this problem. That code uses findAll , with a capital "A", which is a deprecated synonym for find_all . (You shouldn't use the deprecated synonyms, of course.)

Second, you're explicitly asking for lxml's HTML parser instead of its XML parser. Don't do that. See the docs :

BeautifulSoup(markup, ["lxml", "xml"])

Since you didn't give us a complete XML document, I don't know whether this will affect you, or whether you'll happen to get lucky. But you shouldn't rely on happening to get lucky when it's so easy to actually do things right.

BeautifulSoup4 not recognizing xml tag

Question

1 answers

solution1
1 ACCPTED 2014-01-02 22:14:50

BeautifulSoup4 not recognizing xml tag

Question

1 answers

solution1 1 ACCPTED 2014-01-02 22:14:50

solution1
1 ACCPTED 2014-01-02 22:14:50