How to use CSS/Selenium to get links from webpage

Question

I want the links per block on the following page.

BeautifulSoup does not seem to work as the page seems to render in javascript but it should work using CSS or Selenium?

How would I use either of those to extract the html links from the page(s)

from bs4 import BeautifulSoup
import requests
lists=[]
baseurl='https://meetinglibrary.asco.org/'
for x in range (1,5):
    url=f'https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}'
    r=requests.get(url)
    soup=BeautifulSoup(r.content,'html.parser')
    productlist=soup.find_all('a',class_='ng-star-inserted')
    for item in productlist:
        print(item)

Answer 1

Thats pretty easy: you access the site using Selenium and then pass the source code to bs4:

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
for x in range (1,5):
  driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}')
  time.sleep(10)
  page_source = driver.page_source
  productlist=soup.find_all('a',class_='ng-star-inserted')
  driver.close()
  for item in productlist:
     print(item)

Be aware that you might need to change some details, you need to insert the executable path when using webdriver.Firefox("insert path here") also make sure you have selenium installed, you can do that using

pip install selenium

if you need to scroll on the page to load the content u can do that using:

for i in range(60):
driver.execute_script("arguments[0].scrollBy(0, 500)")
driver.implicitly_wait(2)

of course you can adjust the "60" depending on how large the site is. Reference: The Docs of Selenium This Page basically doing what you want to do

How to use CSS/Selenium to get links from webpage

Question

1 answers

solution1
2 ACCPTED 2020-12-01 07:28:00

How to use CSS/Selenium to get links from webpage

Question

1 answers

solution1 2 ACCPTED 2020-12-01 07:28:00

solution1
2 ACCPTED 2020-12-01 07:28:00