[英]Beautiful soup not loading new page after Selenium click
The first page is loaded and parsed as expected but after the clicking on Next page, the BS4 does not get the new page from driver.page_source第一页按预期加载和解析,但在单击下一页后,BS4 未从 driver.page_source 获取新页面
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as ureq
import random
import time
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
def parse_html(pagesource, count):
soup = BeautifulSoup(driver.page_source, 'html.parser')
tables = soup.findChildren('table')
# This will get the first (and only) table. Your page may have more.
my_table = tables[0]
table_body = my_table.find('tbody')
all_rows = table_body.find_all('tr')
# print (all_rows[0])
for row in all_rows:
print (count)
count += 1
try:
path_body = row.find("td", class_="views-field-company-name")
path = path_body.find("a")['href']
company_name = path_body.find("a").text
company_name = company_name.strip()
print (company_name)
issue_datetime = row.find("td", class_="views-field-field-letter-issue-datetime")
# print (type(issue_datetime.find("time")['datetime']))
issue_recepient_office = row.find("td", class_="views-field-field-building").string
issue_recepient_office = issue_recepient_office.strip()
# print (issue_recepient_office)
detailed_description = row.find("td", class_="views-field-field-detailed-description-2").string
if detailed_description:
detailed_description = detailed_description.strip()
else:
detailed_description = ""
#print (detailed_description)
except:
pass
url = 'https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/compliance-actions-and-activities/warning-letters'
driver.get(url)
count = 1
parse_html(driver.page_source, count)
for i in range(0,3):
time.sleep(10)
#print(driver.page_source.encode('utf-8'))
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#datatable_next a'))).click()
time.sleep(30)
parse_html(driver.page_source, count)
driver.quit()
Output: Output:
1
Ruth Special Food Store LLC
Foreign Supplier Verification Program (FSVP)
2
EarthLab, Inc., dba Wise Woman Herbals
3
Big Olaf Creamery LLC dba Big Olaf
CGMP/Food/Prepared, Packed or Held Under Insanitary Conditions/Adulterated/L. monocytogenes
4
Bainbridge Beverage West, LLC
Juice HACCP/CGMP for Foods/Adulterated/Insanitary Conditions
5
VapeL1FE, LLC
Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded
6
Mike Millenkamp Dairy Cattle
7
Empowered Diagnostics LLC
Unapproved Products Related to the Coronavirus Disease 2019 (COVID-19)
8
RoyalVibe Health Ltd.
CGMP/QSR/Medical Devices/PMA/Adulterated/Misbranded
9
Land View, Inc.
CGMP/Medicated Feeds/Adulterated
10
Green Pharmaceuticals Inc.
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
none selenium solution:无 selenium 解决方法:
import requests
from bs4 import BeautifulSoup
import pandas as pd
PAGE_LENGHT = 50
def get_letters(page: int):
start = page * PAGE_LENGHT
url = f"https://www.fda.gov/datatables/views/ajax?field_letter_issue_datetime=All&field_change_date_2=All&draw={page}&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=true&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=2&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=3&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=4&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=5&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=6&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=true&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B7%5D%5Bdata%5D=7&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=true&columns%5B7%5D%5Borderable%5D=false&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=false&start={start}&length={PAGE_LENGHT}&search%5Bregex%5D=false&_drupal_ajax=1&_wrapper_format=drupal_ajax&view_base_path=inspections-compliance-enforcement-and-criminal-investigations%2Fcompliance-actions-and-activities%2Fwarning-letters%2Fdatatables-data&view_display_id=warning_letter_solr_block&view_dom_id=4605f153788b3a17043d0e031eb733846503177581602cd9fd58ecd78629801b&view_name=warning_letter_solr_index&view_path=%2Finspections-compliance-enforcement-and-criminal-investigations%2Fcompliance-actions-and-activities%2Fwarning-letters&total_items=3433"
letters = []
for letter in requests.get(url).json()['data']:
letters.append([BeautifulSoup(row, 'lxml').get_text(strip=True) for row in letter])
return letters
result = []
for i in range(0, 5):
result += get_letters(i)
df = pd.DataFrame(result)
print(df)
OUTPUT: OUTPUT:
0 1 2 ... 5 6 7
0 12/27/2022 11/07/2022 Land View, Inc. ...
1 12/27/2022 11/22/2022 MD Pharmaceutical Supply, LLC ...
2 12/27/2022 06/01/2022 Supreme Fruit Produce, Inc. ...
3 12/27/2022 10/06/2022 Empowered Diagnostics LLC ...
4 12/27/2022 11/18/2022 RoyalVibe Health Ltd. ...
.. ... ... ... ... .. .. ..
245 08/11/2022 08/11/2022 The Juice Bar ...
246 08/09/2022 06/16/2022 InfuTronix LLC ...
247 08/09/2022 07/12/2022 Zyno Medical LLC ...
248 08/09/2022 07/28/2022 Vitti Labs, LLC ...
249 08/09/2022 07/22/2022 Muscle Feast, LLC ...
UPDATE更新
to find request use dev tools(f12 default in chrome)使用开发工具查找请求(chrome 中默认为 f12)
now we need to figure out how to work with this data, this is a simple html text, and bs4 will help us with this.现在我们需要弄清楚如何使用这些数据,这是一个简单的 html 文本,bs4 将帮助我们解决这个问题。 If link needed u can change
letters.append
to如果需要链接,您可以将
letters.append
更改为
letters.append({
'Posted Date': BeautifulSoup(letter[0], 'lxml').get_text(strip=True),
'Letter Issue Date': BeautifulSoup(letter[1], 'lxml').get_text(strip=True),
'Company Name': BeautifulSoup(letter[2], 'lxml').get_text(strip=True),
'Issuing Office': BeautifulSoup(letter[3], 'lxml').get_text(strip=True),
'Subject': BeautifulSoup(letter[4], 'lxml').get_text(strip=True),
'Link': 'https://www.fda.gov/' + BeautifulSoup(letter[2], 'lxml').find('a').get('href'),
})
And new output looks like:新的 output 看起来像:
Posted Date Letter Issue Date Company Name Issuing Office Subject Link
0 12/27/2022 11/07/2022 Land View, Inc. Division of Human and Animal Food Operations West VI CGMP/Medicated Feeds/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/land-view-inc-638704-11072022
1 12/27/2022 11/22/2022 MD Pharmaceutical Supply, LLC Division of Pharmaceutical Quality Operations I CGMP/Active Pharmaceutical Ingredient (API)/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/md-pharmaceutical-supply-llc-637815-11222022
2 12/27/2022 06/01/2022 Supreme Fruit Produce, Inc. Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/supreme-fruit-produce-inc-631972-06012022
3 12/27/2022 10/06/2022 Empowered Diagnostics LLC Center for Devices and Radiological Health Unapproved Products Related to the Coronavirus Disease 2019 (COVID-19) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/empowered-diagnostics-llc-638164-10062022
4 12/27/2022 11/18/2022 RoyalVibe Health Ltd. Center for Devices and Radiological Health CGMP/QSR/Medical Devices/PMA/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/royalvibe-health-ltd-639553-11182022
5 12/27/2022 11/28/2022 Bainbridge Beverage West, LLC Division of Human and Animal Food Operations West V Juice HACCP/CGMP for Foods/Adulterated/Insanitary Conditions https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/bainbridge-beverage-west-llc-638942-11282022
6 12/27/2022 12/16/2022 Green Pharmaceuticals Inc. Division of Pharmaceutical Quality Operations IV Drug Product/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/green-pharmaceuticals-inc-635162-12162022
7 12/27/2022 12/16/2022 VapeL1FE, LLC Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vapel1fe-llc-648624-12162022
8 12/27/2022 12/09/2022 Ruth Special Food Store LLC Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/ruth-special-food-store-llc-644551-12092022
9 12/27/2022 11/28/2022 Mike Millenkamp Dairy Cattle Division of Human and Animal Food Operations West II New Animal Drug/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/mike-millenkamp-dairy-cattle-640782-11282022
10 12/27/2022 11/10/2022 EarthLab, Inc., dba Wise Woman Herbals Division of Human and Animal Food Operations West VI CGMP/Dietary Supplement/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/earthlab-inc-dba-wise-woman-herbals-634872-11102022
11 12/27/2022 12/09/2022 Big Olaf Creamery LLC dba Big Olaf Division of Human and Animal Food Operations East IV CGMP/Food/Prepared, Packed or Held Under Insanitary Conditions/Adulterated/L. monocytogenes https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/big-olaf-creamery-llc-dba-big-olaf-642758-12092022
12 12/22/2022 12/22/2022 BS Vapes LLC Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/bs-vapes-llc-647308-12222022
13 12/22/2022 12/22/2022 JP & SN Enterprises Inc. d/b/a eCigs International Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/jp-sn-enterprises-inc-dba-ecigs-international-647315-12222022
14 12/20/2022 11/08/2022 Dollar Tree, Inc. Office of Human and Animal Food Operations – West Division 3 Interstate Commerce/Food/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/dollar-tree-inc-629509-11082022
15 12/20/2022 07/27/2022 Sagent Pharmaceuticals, Inc. Division Pharmaceutical Quality Operations I CGMP/Drugs/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/sagent-pharmaceuticals-inc-636636-07272022
16 12/20/2022 11/21/2022 Nature’s Way Farms, LLC Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/natures-way-farms-llc-641201-11212022
17 12/20/2022 12/08/2022 Nortec Quimica SA Center for Drug Evaluation and Research | CDER CGMP/Active Pharmaceutical Ingredient (API)/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/nortec-quimica-sa-639894-12082022
18 12/20/2022 11/30/2022 CHS Inc./CHS River Plains Division of Human and Animal Food Operations West I CGMP/Medicated Feeds/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/chs-incchs-river-plains-642790-11302022
19 12/20/2022 12/02/2022 DuPont Nutrition USA Inc. Division of Pharmaceutical Quality Operations I CGMP/Drug Products/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/dupont-nutrition-usa-inc-627211-12022022
20 12/20/2022 11/01/2022 Del Valle Import Corp. Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/del-valle-import-corp-642784-11012022
21 12/20/2022 08/25/2022 Sree Nidhi Corp Center for Food Safety and Applied Nutrition (CFSAN) Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/sree-nidhi-corp-634266-08252022
22 12/20/2022 12/14/2022 Adarsh Daswani, M.D. Center for Drug Evaluation and Research | CDER Clinical Investigator https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/adarsh-daswani-md-648606-12142022
23 12/15/2022 12/15/2022 Vape King Inc. Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vape-king-inc-646625-12152022
24 12/15/2022 12/15/2022 Vapor E-Cigarette, L.L.C. Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vapor-e-cigarette-llc-646876-12152022
25 12/13/2022 12/02/2022 SV3, LLC d/b/a Mi-One Brands Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/sv3-llc-dba-mi-one-brands-647624-12022022
26 12/13/2022 12/07/2022 Centrient Pharmaceuticals India Private Limited Center for Drug Evaluation and Research | CDER CGMP/Active Pharmaceutical Ingredient (API)/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/centrient-pharmaceuticals-india-private-limited-640196-12072022
27 12/13/2022 11/22/2022 Cecilia Alvarez Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/cecilia-alvarez-643706-11222022
28 12/13/2022 11/29/2022 Gobwa Exotic Imports Inc. Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/gobwa-exotic-imports-inc-641031-11292022
29 12/13/2022 12/05/2022 Thriftmaster Texas, LLC. d/b/a ThriftMaster Global Holdings, Inc. and TM Global Biosciences, LLC Center for Drug Evaluation and Research | CDER Finished Pharmaceuticals/Unapproved New Drug/Misbranded/Adulterated Human Foods https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/thriftmaster-texas-llc-dba-thriftmaster-global-holdings-inc-and-tm-global-biosciences-llc-641057
30 12/13/2022 11/21/2022 Euphoria Fancy Food Inc. Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/euphoria-fancy-food-inc-641801-11212022
31 12/08/2022 12/08/2022 Cloud House Vape Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/cloud-house-vape-647544-12082022
32 12/08/2022 12/08/2022 Vapors of Ohio Inc d/b/a Nostalgic Vapes Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vapors-ohio-inc-dba-nostalgic-vapes-644739-12082022
33 12/06/2022 11/28/2022 AG Hair Limited Center for Drug Evaluation and Research | CDER CGMP/Finished Pharmaceuticals/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/ag-hair-limited-638646-11282022
34 12/06/2022 11/22/2022 Glenmark Pharmaceuticals Limited Center for Drug Evaluation and Research | CDER CGMP/Finished Pharmaceuticals/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/glenmark-pharmaceuticals-limited-637314-11222022
35 12/06/2022 09/23/2022 Saffron USA LLC Division of Human and Animal Food Operations East IV Unapproved New Drugs/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/saffron-usa-llc-629821-09232022
36 12/06/2022 10/24/2022 Cryos International USA LLC Division of Biological Products Operations I Deviations/CFR/Regulations for Human Cells, Tissues & Cellular Products (HCT/Ps) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/cryos-international-usa-llc-639696-10242022
37 12/06/2022 10/17/2022 Zuland Distributor Corp Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/zuland-distributor-corp-638899-10172022
38 12/06/2022 11/07/2022 Manzela USA, LLC Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/manzela-usa-llc-642268-11072022
39 12/06/2022 11/07/2022 Maliba African Market Corp. Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/maliba-african-market-corp-642698-11072022
40 12/06/2022 11/30/2022 Kari Gran Inc. Division of Pharmaceutical Quality Operations IV CGMP/Finished Pharmaceuticals/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/kari-gran-inc-640035-11302022
41 12/01/2022 12/01/2022 Vapor Candy Inc d/b/a The Vape Stop Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vapor-candy-inc-dba-vape-stop-645475-12012022
42 11/30/2022 11/30/2022 Jayde's Vapor Lounge Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/jaydes-vapor-lounge-645085-11302022
43 11/29/2022 11/10/2022 Vapor Plus OK LLC Center for Tobacco Products Family Smoking Prevention and Tobacco Control Act/Adulterated/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/vapor-plus-ok-llc-646225-11102022
44 11/29/2022 11/18/2022 "David M. Lubeck, M.D./Arbor Centers for EyeCare Center for Drug Evaluation and Research | CDER Clinical Investigator (Sponsor) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/david-m-lubeck-mdarbor-centers-eyecare-643531-11182022
45 11/29/2022 06/01/2022 Jam Jam Services, Inc. Division of Southeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/jam-jam-services-inc-630847-06012022
46 11/29/2022 09/19/2022 La Serranita Import and Export LLC Division of Northeast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/la-serranita-import-and-export-llc-633743-09192022
47 11/29/2022 11/09/2022 J R Imports LLC Division of Southwest Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/j-r-imports-llc-643214-11092022
48 11/29/2022 09/01/2022 Shuzy Rock Inc. Division of Pharmaceutical Quality Operations I CGMP/Finished Pharmaceuticals/Adulterated https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/shuzy-rock-inc-630110-09012022
49 11/22/2022 10/19/2022 Pepe’s Foods Inc. Division of West Coast Imports Foreign Supplier Verification Program (FSVP) https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/pepes-foods-inc-640716-10192022
50 11/22/2022 11/14/2022 yourtramadol.com Center for Drug Evaluation and Research | CDER Finished Pharmaceuticals/Unapproved New Drug/Misbranded https://www.fda.gov//inspections-compliance-enforcement-and-criminal-investigations/warning-letters/yourtramadolcom-639959-11142022
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.