简体   繁体   中英

Can't scrape category titles from a webpage

I've written a scraper in python to get different category names from a webpage but it is unable to fetch anything from that page. I'm seriously confused not to be able to figure out where i'm going wrong. Any help would be vastly appreciated.

Here is the link to the webpage: URL

Here is what I've tried so far:

from bs4 import BeautifulSoup
import requests

res = requests.get("replace_with_above_url",headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select('.slide_container .h3.standardTitle'):
    print(items.text)

Elements within which one such category names I'm after:

<div class="slide_container">
    <a href="/offers/furniture/" tabindex="0">
        <picture style="float: left; width: 100%;"><img style="width:100%" src="/_m4/9/8/1513184943_4413.jpg" data-w="270"></picture>
        <div class="floated-details inverted" style="height: 69px;">
            <div class="h3 margin-top-sm margin-bottom-sm standardTitle">
                Furniture Offers                         #This is the name I'm after
            </div>
            <p class="carouselDesc">
            </p>
        </div>
    </a>
</div>
from bs4 import BeautifulSoup
import requests

headers = {
    'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding':'gzip, deflate, br',
'accept-language':'en-US,en;q=0.9',
'cache-control':'max-age=0',
'referer':'https://www.therange.co.uk/',
'upgrade-insecure-requests':'1',
'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
}
res = requests.get("https://www.therange.co.uk/",headers=headers)
soup = BeautifulSoup(res.text,'html.parser')
for items in soup.select('.slide_container .h3.standardTitle'):
    print(items.text)

Try this

a user-agent is not enough because headers are the most important part of scrapping.if you miss any header then server ll treat you as a bot.

使用"html.parser"而不是"lxml"

soup = BeautifulSoup(res.text,"html.parser")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM