简体   繁体   中英

Scrape text under <h4> using Requests-HTML (Requests-HTML, Python)

I am attempting to extract the socket type of the cpu as you can see in the following image . I have identified that the socket type is under the <h4> Socket heading as seen in the following image .

So far I have been able to scrape .spec.block and find all <h4>'s nested inside. However I can't get the text under each heading

Here is my code

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('https://au.pcpartpicker.com/product/' + jLF48d)
about = r.html.find('.specs.block')[0]
about = about.find('h4')

print(about.text)

This prints

 [ <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >]

However when I change the print statement to:

print(about.text)

I get the following error:

AttributeError: 'list' object has no attribute 'text'

Update:

print(about[0].text)

This code prints:

Manufacturer AMD Which is the first heading and text however I need the 4th

Any idea what code I can use to reach the desired result?

If you require any more information please let me know.

Replacing: print(about[0].text)

With

print(about[3].text)

As seen on the code in my question above solved the problem for me!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM