简体   繁体   中英

Not all HTML elements returned from Beautifulsoup find_all method

Trying to use Beautiful soup to pull data from a website.However when I use find_all function I get only a subset of target elements (li), so in this case instead of getting 24 li items only 12 are returned.

** Sample code **

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://www.tomford.com/beauty/lips/'
headers = {'User-Agent': <using my useragent>}
reqs = requests.get(url,headers)
soup = BeautifulSoup(reqs.text, 'lxml')


ul_search_results=soup.find_all("li", {"class": "grid-tile"})

for li in ul_search_results:
  
  print("{0}".format(li.attrs.get('id')))
    

I have also tried, first fetching the parent element of all the li's using soup.find_all("ul",{"id":"search-result-items"} and tried iterating it for li tags. That hasn't returned the complete results too!

Appreciate any help here!

This is happening because the site only shows 12 items to begin with. In the browser, when you scroll down it makes a second request and loads another 12.

The second request it makes is this url https://www.tomford.com/beauty/lips/?start=12&sz=12&format=page-element&rendertype=macro

You can change this url to suit your needs. Instead change start to 0 and sz to 1000 and you should get a page with all available items.

https://www.tomford.com/beauty/lips/?start=0&sz=1000&format=page-element&rendertype=macro

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM