Scraping infinite scrolling pages with load more button using Selenium

Question

I was trying to get all project titles and creator names by webscraping and most of it is working, but I got a "TimeoutException: Message:" when I was trying to scrape infinite scrolling pages with "load more" button. Please let me know what is wrong and what i need to correct. Thanks

Below is the code currently being used:

from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get("https://www.kickstarter.com/discover/advanced?sort=newest&seed=2695789&page=1/")

button = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,'bttn keyboard-focusable bttn-medium bttn-primary theme--create fill-bttn-icon hover-fill-bttn-icon')))
button.click()

names=[]
creators=[] 
soup = BeautifulSoup(driver.page_source)
for a in soup.findAll('div',{'class':'js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg'}):
    name=a.find('div', attrs={'class':'clamp-5 navy-500 mb3 hover-target'})
    creator=a.find('div', attrs={'class':'type-13 flex'})
    names.append(name.h3.text) 
    creators.append(creator.text)

df = pd.DataFrame({'Name':names,'Creator':creators})

Answer 1

You really need not to use the Beautiful Soup and selenium . Go for requests library and its easy to grab it all hassle free.

import requests
import json
records = []
for i in range(5):
    req = requests.get('https://www.kickstarter.com/discover/advanced?google_chrome_workaround&woe_id=0&sort=newest&seed=2695910&page='+str(i),
                       headers={'Accept': 'application/json',
    'Content-Type': 'application/json'})
    if(req.status_code == 200):
        josn2 = req.json()
        projects = josn2.get("projects") 
        for i in range(len(projects)):
            print("Project Name - " + projects[i]['name'],end='          Created By - ')
            print(projects[i]['creator'].get('name'))

        print("----------------")

Output:

you can scroll down to the page as many time loadmore button loads the content put that much count in the for loop you will get all the content.

Scraping infinite scrolling pages with load more button using Selenium

Question

1 answers

solution1
1 2021-04-04 17:47:20

Scraping infinite scrolling pages with load more button using Selenium

Question

1 answers

solution1 1 2021-04-04 17:47:20

solution1
1 2021-04-04 17:47:20