简体   繁体   中英

Scraping with BeautifulSoup for text — NoneType error

I am trying to get table data from wikipedia but I keep getting the error

AttributeError: 'NoneType' object has no attribute 'findAll'

Here is my code.

from bs4 import BeautifulSoup
import urllib
import urllib.request



wiki = "https://en.wikipedia.org/wiki/List_of_current_United_States_Senators"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, "lxml")

name = ""
party = ""
state = ""
picture = ""
link = ""
district = ""

table = soup.find("table", { "class" : "wikitable sortable" })

f = open('output.csv', 'w')

for row in table.findAll("tr"):
    cells = row.findAll("td")


    state = cells[0].find(text=True)
    picture = cells[2].findAll(text=True)
    name = cells[3].find(text=True)
    party = cells[4].find(text=True)


    write_to_file = name + "," + state + "," + party + "," + link + "," + picture + "," + district + "\n"
    print (write_to_file)
    f.write(write_to_file)

f.close()

Any help, even another way to do it (thought about using the wiki api but I'm rather lost on what to use), would be appreciated.

The main problem you are facing is that soup.find("table", { "class" : "wikitable sortable" }) returns None . There is an element of class sortable wikitable sortable , though, and maybe you want that element.

I fixed that and added an if and a few print s. It still doesn't work, but the problem is easier to fix, I guess. Now it's your turn :)

from bs4 import BeautifulSoup
import urllib
import urllib.request

wiki =  "https://en.wikipedia.org/wiki/List_of_current_United_States_Senators"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, "lxml")

name = ""
party = ""
state = ""
picture = ""
link = ""
district = ""

table = soup.find("table", { "class" : "sortable wikitable sortable" })

f = open('output.csv', 'w')

for row in table.findAll("tr"):
    cells = row.findAll("td")
    if cells:
        state = cells[0].find(text=True)
        picture = cells[2].findAll(text=True)
        name = cells[3].find(text=True)
        party = cells[4].find(text=True)

        print(state, type(state))
        print(picture, type(picture))
        print(name, type(name))
        print(party, type(party))
        write_to_file = name + "," + state + "," + party + "," + link + "," + picture + "," + district + "\n"
        print (write_to_file)
        f.write(write_to_file)
        f.flush()

f.close()
import bs4, requests

base_url = 'https://en.wikipedia.org/wiki/List_of_current_United_States_Senators'
response = requests.get(base_url)
soup = bs4.BeautifulSoup(response.text, 'lxml')

with open('out.txt', 'w', newline='') as out:
    writer = csv.writer(out)
    for row in table('tr'):
        row_text = [td.get_text(strip=True) for td in row('td') if td.text ]
        writer.writerow(row_text)
        print(row_text)

print:

[]
['Alabama', '3', 'Shelby, RichardRichard Shelby', 'Republican', 'None', 'U.S. House,Alabama Senate', 'University of Alabama, Tuscaloosa(BA;LLB)Birmingham School of Law(JD)', 'January 3, 1987', '(1934-05-06)May 6, 1934(age\xa082)', '2022']
['Alabama', '2', 'Sessions, JeffJeff Sessions', 'Republican', 'Lawyer in private practice', 'Alabama Attorney General,U.S. Attorneyfor theSouthern District of Alabama', 'Huntingdon College(BA)University of Alabama, Tuscaloosa(JD)', 'January 3, 1997', '(1946-12-24)December 24, 1946(age\xa069)', '2020']
['Alaska', '3', 'Murkowski, LisaLisa Murkowski', 'Republican', 'Lawyer in private practice', 'Alaska House', 'Georgetown University(BA)Willamette University(JD)', 'December 20, 2002', '(1957-05-22)May 22, 1957(age\xa059)', '2022']
['Alaska', '2', 'Sullivan, DanDan Sullivan', 'Republican', 'Lawyer in private practice', 'Alaska Natural Resources Commissioner,Alaska Attorney General,U.S. Assistant Secretary of State for Economic and Business Affairs', 'Harvard University(BA)Georgetown University(MS;JD)', 'January 3, 2015', '(1964-11-13)November 13, 1964(age\xa052)', '2020']

out.txt:

Alabama,3,"Shelby, RichardRichard Shelby",Republican,None,"U.S. House,Alabama Senate","University of Alabama, Tuscaloosa(BA;LLB)Birmingham School of Law(JD)","January 3, 1987","(1934-05-06)May 6, 1934(age 82)",2022
Alabama,2,"Sessions, JeffJeff Sessions",Republican,Lawyer in private practice,"Alabama Attorney General,U.S. Attorneyfor theSouthern District of Alabama","Huntingdon College(BA)University of Alabama, Tuscaloosa(JD)","January 3, 1997","(1946-12-24)December 24, 1946(age 69)",2020
Alaska,3,"Murkowski, LisaLisa Murkowski",Republican,Lawyer in private practice,Alaska House,Georgetown University(BA)Willamette University(JD),"December 20, 2002","(1957-05-22)May 22, 1957(age 59)",2022
Alaska,2,"Sullivan, DanDan Sullivan",Republican,Lawyer in private practice,"Alaska Natural Resources Commissioner,Alaska Attorney General,U.S. Assistant Secretary of State for Economic and Business Affairs",Harvard University(BA)Georgetown University(MS;JD),"January 3, 2015","(1964-11-13)November 13, 1964(age 52)",2020
Arizona,3,"McCain, JohnJohn McCain",Republican,None,"U.S. House,U.S. NavyCaptain",United States Naval Academy(BS),"January 3, 1987","(1936-08-29)August 29, 1936(age 80)",2022
Arizona,1,"Flake, JeffJeff Flake",Republican,Nonprofit director,U.S. House,"Brigham Young University, Utah(BA;MA)","January 3, 2013","(1962-12-31)December 31, 1962(age 53)",2018
Arkansas,3,"Boozman, JohnJohn Boozman",Republican,Optometrist,"Rogers Public School Board,U.S. House","University of Arkansas, Fayetteville(attended)Southern College of Optometry(OD)","January 3, 2011","(1950-12-10)December 10, 1950(age 66)",2022

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM