简体   繁体   中英

BeautifulSoup4 not recognizing xml tag

I'm using BeautifulSoup4 (with lxml parser) to parse xml that looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<metadata id="8735180"  name="Dauphin Island" lat="30.2500" lon="-88.0750"/>
<observations>
<wl t="2013-12-14 00:00"  v="0.725" s="0.059" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:06"  v="0.771" s="0.066" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:12"  v="0.764" s="0.085" f="0,0,0,0" q="v" />

....etc

The python code is like so:

obs_soup = BeautifulSoup(urllib2.urlopen('http://tidesandcurrents.noaa.gov/api/datagetter?product=water_level&application=NOS.COOPS.TAC.WL&begin_date=20131214&end_date=20131216&datum=MSL&station=8735180&time_zone=GMT&units=english&interval=&format=xml),'lxml')

for l in obs_soup.findall('wl'):
    obs.append(l['v'])

I keep getting the error:

for l in obs_soup.findall('wl'):
TypeError: 'NoneType' object is not callable

I tried the solution here (except instead of looking for 'html', I looked for 'data'), but that didn't work. Any suggestions?

There are two problems here.


First, there is no such method as findall in BeautifulSoup . Change that to:

for l in obs_soup.find_all('wl'):
    obs.append(l['v'])

… and it will work.


So, why are you getting this TypeError: 'NoneType' object is not callable instead of the more usual AttributeError ? Because of BeautifulSoup's magic lookup—the same thing that lets you do obs_soup.wl as a shortcut for finding a <wl> also lets you do obs_soup.findall as a shortcut for finding a <findall> . Because there is no <findall> node, it returns None . And then you're trying to call that None object as a function, which of course is nonsense.


Also, if you actually had copied and pasted the copy from here as you claimed, you wouldn't have had this problem. That code uses findAll , with a capital "A", which is a deprecated synonym for find_all . (You shouldn't use the deprecated synonyms, of course.)


Second, you're explicitly asking for lxml's HTML parser instead of its XML parser. Don't do that. See the docs :

BeautifulSoup(markup, ["lxml", "xml"])

Since you didn't give us a complete XML document, I don't know whether this will affect you, or whether you'll happen to get lucky. But you shouldn't rely on happening to get lucky when it's so easy to actually do things right.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM