简体   繁体   中英

Beautifulsoup unable to extract data using attrs=class

I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website,

WebSite Link

doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom.

Here is the simple code that I started with to test it out:

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

My code is working fine, with requests

import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)

results= soup.findAll('div', attrs={'class': 'data'})

print results

If you or anyone reading this question would like to know the reason that the code wasn't able to find the attrs value using the code you've given (copied below):

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

The issue is when you attempted to create a BeautifulSoup object soup = bs(urlopen(url)) as the value of urlopen(url) is a response object and not the DOM.

I'm sure any issues you had encountered could have been more easily resolved by using bs(urlopen(url).read()) instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM