I want the links per block on the following page.
BeautifulSoup does not seem to work as the page seems to render in javascript but it should work using CSS or Selenium?
How would I use either of those to extract the html links from the page(s)
from bs4 import BeautifulSoup
import requests
lists=[]
baseurl='https://meetinglibrary.asco.org/'
for x in range (1,5):
url=f'https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
productlist=soup.find_all('a',class_='ng-star-inserted')
for item in productlist:
print(item)
Thats pretty easy: you access the site using Selenium and then pass the source code to bs4:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
for x in range (1,5):
driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}')
time.sleep(10)
page_source = driver.page_source
productlist=soup.find_all('a',class_='ng-star-inserted')
driver.close()
for item in productlist:
print(item)
Be aware that you might need to change some details, you need to insert the executable path when using webdriver.Firefox("insert path here")
also make sure you have selenium installed, you can do that using
pip install selenium
if you need to scroll on the page to load the content u can do that using:
for i in range(60):
driver.execute_script("arguments[0].scrollBy(0, 500)")
driver.implicitly_wait(2)
of course you can adjust the "60" depending on how large the site is. Reference: The Docs of Selenium This Page basically doing what you want to do
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.