简体   繁体   中英

AttributeError: 'list' object has no attribute 'absolute_links'

from requests_html import HTMLSession
url = 'https://www.walmart.com/search?q=70+inch+tv&page=2&affinityOverride=default'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1,timeout=20)
product = r.html.find('div.mb1.ph1.pa0-xl.bb.b--near-white.w-25')



productinfo = []
for item in product.absolute_links:
    # ra = s.get(item)
    # name = ra.html.find('h1',first=True).text
    products = {
        'link' :item,

    }

    productinfo.append(products)

print(productinfo)
print(len(productinfo))

Output

for item in product.absolute_links:
AttributeError: 'list' object has no attribute 'absolute_links'

I want to get link of every product than scrape some data from this website by requests-html library, but i'm getting attribute error.please help me.chack the website html

But can I solve captcha and logging via requests-html library? I'm not super familiar with requests-html library

Neither am I, but you can paste the request from your browser to https://curlconverter.com/ (they also have instructions on how to copy the request) and they'll convert it to python code for a request with headers and cookies that you can then paste into your code. The last line of their code will be response = requests.get(..... , but you can replace it with r = s.get(... so that your code can still use requests_html methods like .html.render and .absolute_links ( requests doesn't parse the HTML).

Just keep in mind that the cookies will expire , likely within a few hours, and that you'll have to copy them from your browser again by then if you want to keep scraping this way.


 for item in product.absolute_links: AttributeError: 'list' object has no attribute 'absolute_links'

You can only apply .absolute_links to an element and .find returns a list of elements (unless you specify first=True ). Also .absolute_links returns a set of links [even when that set only contains one link], so you need to either loop through them or convert to list and access it through indexing to get at the link/s.

product = r.html.find('div.mb1.ph1.pa0-xl.bb.b--near-white.w-25')

productinfo = []
for prod in product:
    item = prod.absolute_links # get product link/s
    # ra = s.get(list(item)[0]) # go to first product link
    # name = ra.html.find('h1',first=True).text
    products = {'link' :item, }
    productinfo.append(products)

or, to absolutely ensure that you're looping through a unique urls,

product = r.html.find('div.mb1.ph1.pa0-xl.bb.b--near-white.w-25')
prodUrls = set().union(*[d.absolute_links for d in product]) # combine all sets of product links

productinfo = []
for item in prodUrls:
    # ra = s.get(item) 
    # name = ra.html.find('h1',first=True).text
    products = {'link' :item, }
    productinfo.append(products)

Btw, if it doesn't find any products, then ofc you won't get any links even if the error goes away, so add a line to print the request status (in case something went wrong there) as well as how many products and links were extracted.

print(r.status_code, r.reason, f' - {len(product)} products and {len(prodUrls)} product links from', r.url)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM