简体   繁体   中英

Webscraping with BS4 NoneType object has no attribute find

I'm not sure why my code isn't working. I get AttributeError: 'NoneType' object has no attribute 'find'

My code is as follows:

import requests
from bs4 import BeautifulSoup
import csv

root_url = "https://urj.org/urj-congregations?congregation=&distance_address_field=&distance_num_miles=5.0&worship_services=All&community=All&urj_camp_affiliations=All&page=0"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')

paging = soup.find("nav",{"aria-label":"pagination-heading-3"}).find("li",{"class":"page-item"}).find_all("a")
start_page = paging[1].text
last_page = paging[len(paging)-2].text


outfile = open('congregationlookup.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Name", "Address", "Phone"])


pages = list(range(1,int(last_page)+1))
for page in pages:
    url = 'https://urj.org/urj-congregations?congregation=&distance_address_field=&distance_num_miles=5.0&worship_services=All&community=All&urj_camp_affiliations=All&page=%s' %(page)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')

    #print(soup.prettify())
    print ('Processing page: %s' %(page))

    name_list = soup.findAll("div",{"class":"views-field views-field-congregation"})
    for element in name_list:
        name = element.find('h3').text
        address = element.find('field-content mb-2').text.strip()
        phone = element.find("i",{"class":"fa fa-phone mr-1"}).text.strip()

        writer.writerow([name, address, phone])

outfile.close()
print ('Done') 

I'm trying to scrape the name, address, and phone number from the URJ Congregations website.

Thank you

Final code

import csv
import requests
from bs4 import BeautifulSoup

# root_url = "https://urj.org/urj-congregations?congregation=&distance_address_field=&distance_num_miles=5.0&worship_services=All&community=All&urj_camp_affiliations=All&page=0"
# html = requests.get(root_url)
# soup = BeautifulSoup(html.text, 'html.parser')
# paging = soup.find("nav", {"aria-label": "pagination-heading--3"}).find("ul", {"class": "pagination"}).find_all("a")
# start_page = paging[1].text
# last_page = paging[len(paging) - 3].text

outfile = open('congregationlookup.csv', 'w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Name", "Address", "Phone"])

pages = list(range(1, 1000))
for page in pages:

    url = 'https://urj.org/urj-congregations?congregation=&distance_address_field=&distance_num_miles=5.0&worship_services=All&community=All&urj_camp_affiliations=All&page=%s' % (
        page)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')

    # print(soup.prettify())
    print('Processing page: %s' % (page))
    elements = soup.find_all("div", {"class": "views-row"})
    if len(elements) == 0:
        break
    for element in elements:
        name = element.find("div", {"class": "views-field views-field-congregation"}).text.strip()
        address = element.find("div", {"class": "views-field views-field-country"}).text.strip()
        phone = element.find("div", {"class": "views-field views-field-website"}).text.strip().split("\n")[0]
        writer.writerow([name, address, phone])

outfile.close()
print('Done')

Most likely, your name_list contains a None type. So, when you attempt to run element.find(), you are performing a string operation on a None, hence your error.

https://docs.python.org/3/library/stdtypes.html#str.find

Also as an FYI, findAll() is bs3 syntax. You should use find_all() Difference between "findAll" and "find_all" in BeautifulSoup

There is a load of problems

The first problem is

"pagination-heading--3"

istead of

"pagination-heading-3"

Next i changed

paging = soup.find("nav",{"aria-label":"pagination-heading-3"}).find("li",{"class":"page-item"}).find_all("a")

To

paging = soup.find("nav", {"aria-label": "pagination-heading--3"}).find("ul", {"class": "pagination"}).find_all("a")

This was the line where i swapped first problematic string. And also i changed the second search to find ul. You were trying to find 1 li and searching inside of it. This would have reproduced empty list Next

last_page = paging[len(paging) - 3].text

as you are trying to get 3rd element from the end

It still doesn't work, i will keep updating

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM