简体   繁体   中英

Requests-HTML scrape <a> tag image url (Requests-HTML, python)

html to attempt to extract the cpu image from the following webpage i have identified that the image url is in a tag with the class name item: Chrome inspect tool

Here is my code

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('https://au.pcpartpicker.com/product/jLF48d')

about = r.html.find('.item')

print(about)

This prints

Element 'a' class=('item',) onclick='show_gallery(0, carousel_images);return false;'

However when I change the print statement to:

print(about.absolute_links)

I get the following error:

AttributeError: 'list' object has no attribute 'absolute_links'

Any idea why this is happening and how i can fix it?

If you require any more information please let me know.

Thanks

r.html.find('.item') returns a list and list has no attribute absolute_links . Since there may be not only one node can be found with .item , find() method gives you a list as expected.

It will be convinient to get a single node with

about = r.html.find('.item')[0]

However, this won't give you the img link by about.absolute_links , because the element found here is of <a> , not <img>

about = r.html.find('.item')[0]
img = about.xpath('//img')[0]
img.attrs['src'] # => '//cdn.pcpartpicker.com/static/forever/images/product/55aea2dd64e2e3a3e3b1d678048d8d76.256p.jpg'

You can use BeautifulSoup for scraping web page easily.

Below are steps to scrap any web page, Our plan should be as follows:

  1. Use the requests library to load the HTML of the page into Python
  2. Set up BeautifulSoup to process the HTML
  3. Find out which HTML tags contain all the titles
  4. Use BeautifulSoup to extract all the titles from the HTML
  5. Format them nicely

Below is the code -

import requests
from bs4 import BeautifulSoup
base_url = 'https://au.pcpartpicker.com/product/jLF48d'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
for image_src in soup.find_all("img"):
    print(image_src['src'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM