简体   繁体   中英

How to use CSS/Selenium to get links from webpage

I want the links per block on the following page.

BeautifulSoup does not seem to work as the page seems to render in javascript but it should work using CSS or Selenium?

How would I use either of those to extract the html links from the page(s)

from bs4 import BeautifulSoup
import requests
lists=[]
baseurl='https://meetinglibrary.asco.org/'
for x in range (1,5):
    url=f'https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}'
    r=requests.get(url)
    soup=BeautifulSoup(r.content,'html.parser')
    productlist=soup.find_all('a',class_='ng-star-inserted')
    for item in productlist:
        print(item)

Thats pretty easy: you access the site using Selenium and then pass the source code to bs4:

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
for x in range (1,5):
  driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}')
  time.sleep(10)
  page_source = driver.page_source
  productlist=soup.find_all('a',class_='ng-star-inserted')
  driver.close()
  for item in productlist:
     print(item)

Be aware that you might need to change some details, you need to insert the executable path when using webdriver.Firefox("insert path here") also make sure you have selenium installed, you can do that using

pip install selenium

if you need to scroll on the page to load the content u can do that using:

for i in range(60):
driver.execute_script("arguments[0].scrollBy(0, 500)")
driver.implicitly_wait(2)

of course you can adjust the "60" depending on how large the site is. Reference: The Docs of Selenium This Page basically doing what you want to do

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM