简体   繁体   中英

How can I scrape from websites that have infinite scrolling?

I have managed to create a web scraper that can collect the item descriptions however the page loads more items as it scrolls.

from selenium import webdriver 
import time
import requests
from bs4 import BeautifulSoup
from numpy import mean

namelist=[]
driver=webdriver.Chrome()
driver.get("https://waxpeer.com/")
time.sleep(15)



links = driver.find_elements_by_xpath("//div[@class='lpd_div']/a")

I also need the item description to format as:

★ Karambit| Gamma Doppler (Factory new)

rather than:

★ Karambit

Gamma Doppler

Factory new

desc = driver.find_elements_by_xpath("//div[@class='lpd_div']/div[2]/p")
for item in desc:
    print(item.text)

There's no need to use Selenium . The data is available via sending a GET request to the websites API in the following format:

https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0

with the offset of + 50 for every page.


For example, to print the names:

import requests

URL = (
    "https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0"
)

offset = 0

while True:
    try:
        response = requests.get(URL.format(offset=offset)).json()
        for data in response["items"]:
            print(data["name"])
        print("-" * 80)
        offset += 50
    except KeyError:
        break

Output:

★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
...
...

This is what I currently have to scrape an infinite scroll page.

def scroll():
  items = self.w.until(ec.presence_of_all_elements_located(self.item_locator))
  ActionChains(self.driver).move_to_element(items[-1]).perform()
  loader = self.driver.find_elements(*self.loader_locator)
  if loader:
    return True
  return False

The ActionChains part will find the last item and scroll to it causing the page to send a request for more stuff. This subsection of a test I have just verifies the infinite scrolling works, but if you want to do anything with the elements found you could append the items to a master list.

self.w is WebDriverWait by the way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM