I have managed to create a web scraper that can collect the item descriptions however the page loads more items as it scrolls.
from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
from numpy import mean
namelist=[]
driver=webdriver.Chrome()
driver.get("https://waxpeer.com/")
time.sleep(15)
links = driver.find_elements_by_xpath("//div[@class='lpd_div']/a")
I also need the item description to format as:
★ Karambit| Gamma Doppler (Factory new)
rather than:
★ Karambit
Gamma Doppler
Factory new
desc = driver.find_elements_by_xpath("//div[@class='lpd_div']/div[2]/p")
for item in desc:
print(item.text)
There's no need to use Selenium
. The data is available via sending a GET
request to the websites API in the following format:
https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0
with the offset
of + 50 for every page.
For example, to print the names:
import requests
URL = (
"https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0"
)
offset = 0
while True:
try:
response = requests.get(URL.format(offset=offset)).json()
for data in response["items"]:
print(data["name"])
print("-" * 80)
offset += 50
except KeyError:
break
Output:
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
...
...
This is what I currently have to scrape an infinite scroll page.
def scroll():
items = self.w.until(ec.presence_of_all_elements_located(self.item_locator))
ActionChains(self.driver).move_to_element(items[-1]).perform()
loader = self.driver.find_elements(*self.loader_locator)
if loader:
return True
return False
The ActionChains part will find the last item and scroll to it causing the page to send a request for more stuff. This subsection of a test I have just verifies the infinite scrolling works, but if you want to do anything with the elements found you could append the items to a master list.
self.w is WebDriverWait by the way.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.