I have a set of urls,i have those urls in a list called list(eg http://www.amazon.com/b/ref=s9_al_bw_brwse_a_v?_encoding=UTF8&node=9097393011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-4&pf_rd_r=10RXCP9TZPW3BP73EKHA&pf_rd_t=101&pf_rd_p=1818119462&pf_rd_i=2858778011
). At the bottom of this urls page there is the number of pages for each category.
There is a span element from which i am trying to return a url.But it is returning NoneType error
The code i have tried so far
for links in full_links:
mech=Browser()
mech.set_handle_robots(False)
mech.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
url=links
page=mech.open(url)
html=page.read()
soup=BeautifulSoup(html)
no_pages = soup.find('div',id="pagn")
a=no_pages.find('span',{'class':'pagnLink'})
for aes in a:
print aes.text
for a in elm.findAll('a'):
link='http://www.amazon.com'+a['href']
print link
You failed to include a full traceback, but presumably it is the soup.find('div')
call that returned None
. For that page the element you tried to find is not present.
If an element is not found, Element.find()
returns None
, and a subsequent line that tries to use that None
value will fail. In this case no_pages
is None
, so no_pages.find()
then fails.
For element searches like these, it is much easier to use CSS selectors :
for page_link in soup.select('div#pagn span.pagnLink a[href]'):
link = 'http://www.amazon.com' + page_link['href']
print link
This will find all links within the <div id="pagn">
-> <span class="pagnLink>
element tree, provided they have an href
attribute.
This search will just return an empty loop if either the div
or the span
is not present.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.