简体   繁体   中英

parsing html using beautiful soup

I am a newbie in Python and have a simple question on parsing html. I am using Beautiful soup to get upto this point. I want to extract the taxes and maintenance from the below. I am not sure how to do this.

div class="estimated_payment clickable overlay_trigger hidden-xs"
id="overlay_trigger_1255749" se:behavior="monthly_payment" se:monthly_payment:attributes='{"id":1255749,"taxes":3682.0,"price":5500000,"maintenance":1875.0,"mortgage_rate":3.5,"mortgage_term":30,"down_payment_amount":1100000.0,"down_payment_rate":20.0,"min_down_payment_rate":20.0,"min_down_payment_amount":1100000.0}'> Est. Payment:

You need to do it in two steps:

  • locate the element and extract the se:monthly_payment:attributes attribute value
  • load it via json.loads() to a Python dictionary and get the desired amounts by keys

Implementation:

import json

from bs4 import BeautifulSoup


data = """
<div class="estimated_payment clickable overlay_trigger hidden-xs"
     id="overlay_trigger_1255749"
     se:behavior="monthly_payment"
     se:monthly_payment:attributes='{"id":1255749,"taxes":3682.0,"price":5500000,"maintenance":1875.0,"mortgage_rate":3.5,"mortgage_term":30,"down_payment_amount":1100000.0,"down_payment_rate":20.0,"min_down_payment_rate":20.0,"min_down_payment_amount":1100000.0}'>
     Est. Payment: $0
</div>
"""
soup = BeautifulSoup(data, "html.parser")

attr_value = soup.select_one(".estimated_payment")["se:monthly_payment:attributes"]
payment_data = json.loads(attr_value)

print(payment_data["taxes"])
print(payment_data["maintenance"])

Prints:

3682.0
1875.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM