I am having trouble returning all desired data from a portion of a web page using BeautifulSoup. When I run the below python, the for-loop only brings back the first record it finds, not the entire data set from the web page:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.ncsl.org/research/health/state-action-on-coronavirus-covid-19.aspx')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('tbody')
records = []
for result in results:
state_name = result.find('td').text
law_Name = result.find('a').text
law_link = result.find('a').get('href')
law_status = result.find('b').text
law_descr = result.find('tr').text[16:-2]
records.append((state_name, law_Name,law_link,law_status,law_descr))
Only one element populates in the records list, even though I am using a for loop to go through all of results object (which is a bs4.element.ResultSet):
[('Alabama',
'SJR 40',
'http://alisondb.legislature.state.al.us/ALISON/SearchableInstruments/2020RS/PrintFiles/SJR40-enr.pdf',
'Eligible for Governor.',
' Urges individuals to fist bump rather than shake hands. Eligible for Governor')]
Any assistance to fix my code would be greatly appreciated. Thank you!
You have one <tbody>
tag in the source code so it will create a list of one element. And when you try to find the td
it will take only the first found. I think you want a list of all <tr>
in the tbody
and to do that use soup.find_all('tbody')[0].find_all('tr')
Btw, take care of the structure, some links don't have <b>
. I think this can help you:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.ncsl.org/research/health/state-action-on-coronavirus-covid-19.aspx')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('tbody')[0].find_all('tr')
records = []
for result in results:
state_name = result.find('td').text
if result.find('a'):
law_Name = result.find('a').text
law_link = result.find('a').get('href')
else:
law_Name = None
law_link = None
law_status = result.find('b').text if result.find('b') else None
law_descr = result.find_all('td')[1].text[16:-2]
records.append((state_name, law_Name,law_link,law_status,law_descr))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.