How can I scrape from websites that have infinite scrolling?

Question

I have managed to create a web scraper that can collect the item descriptions however the page loads more items as it scrolls.

from selenium import webdriver 
import time
import requests
from bs4 import BeautifulSoup
from numpy import mean

namelist=[]
driver=webdriver.Chrome()
driver.get("https://waxpeer.com/")
time.sleep(15)



links = driver.find_elements_by_xpath("//div[@class='lpd_div']/a")

I also need the item description to format as:

★ Karambit| Gamma Doppler (Factory new)

rather than:

★ Karambit

Gamma Doppler

Factory new

desc = driver.find_elements_by_xpath("//div[@class='lpd_div']/div[2]/p")
for item in desc:
    print(item.text)

Answer 1

There's no need to use Selenium . The data is available via sending a GET request to the websites API in the following format:

https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0

with the offset of + 50 for every page.

For example, to print the names:

import requests

URL = (
    "https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0"
)

offset = 0

while True:
    try:
        response = requests.get(URL.format(offset=offset)).json()
        for data in response["items"]:
            print(data["name"])
        print("-" * 80)
        offset += 50
    except KeyError:
        break

Output:

★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
...
...

Answer 2

This is what I currently have to scrape an infinite scroll page.

def scroll():
  items = self.w.until(ec.presence_of_all_elements_located(self.item_locator))
  ActionChains(self.driver).move_to_element(items[-1]).perform()
  loader = self.driver.find_elements(*self.loader_locator)
  if loader:
    return True
  return False

The ActionChains part will find the last item and scroll to it causing the page to send a request for more stuff. This subsection of a test I have just verifies the infinite scrolling works, but if you want to do anything with the elements found you could append the items to a master list.

self.w is WebDriverWait by the way.

How can I scrape from websites that have infinite scrolling?

Question

2 answers

solution1
3 ACCPTED 2021-04-30 16:17:53

solution2
1 2021-04-30 16:53:01

How can I scrape from websites that have infinite scrolling?

Question

2 answers

solution1 3 ACCPTED 2021-04-30 16:17:53

solution2 1 2021-04-30 16:53:01

solution1
3 ACCPTED 2021-04-30 16:17:53

solution2
1 2021-04-30 16:53:01