简体   繁体   中英

BS4 | Python | Scraping specific data in div with multiple values

I'm pretty new to Python, but I've started experimenting with web scraping with BS4 with some success, but I'm now on a new personal project where I'm indexing AutoTrader from an HTML file.

So far I'm able to scrape all the values I need, but one. I've searched and can't find a solution

I need to extract the province "BC" from data-payment-province="BC" from the below code

<div class="asLowAs payment-tag-disclaimer" data-payment-tag-adid="66736200" data-payment-province="BC" data-payment-tag-isnew="False" style="display: none" data-toggle="popover">

I've used location = soup.find_all('div', class_='data-payment-province')

but it returns []

Idk, I'm probably being dumb and missing something obvious but I'm honestly so stumped.

Also, I should probably ask this in another question. But does anyone know how to only get the values as output instead of the HTML and Values?

ex

Current:

itemOffered = soup.find_all("span", itemprop="itemOffered")

OUTPUT:

</span>, <span itemprop=""itemOffered"">
2019 Hyundai Elantra GT | Bluetooth | Backup Camera | Heated Seats | Blind

Desired OUTPUT:

2019 Hyundai Elantra GT

Give this a shot for your first problem:

import requests
from bs4 import BeautifulSoup
import re

.....

province_re = re.compile(r'[A-Z]{2}')

location = soup.find_all('div', {'data-payment-province': province_re})

for loc in location:
    print(loc.attrs['data-payment-province'])

A much cleaner approach would be this:

divs= soup.find_all('div')

for div in divs:
   if div.has_attrs('data-payment-province'):
      print(div['data-payment-province'])

And to get the text of elements you can use this:

elements = soup.find_all(['span','element1','element2'])

for element in elements:
   fulltextofelement = element.find(text=True, recursive=True)
   onlyparenttext = element.find(text=True, recursive=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM