BeautifulSoup 无法在 Div 之间获取文本

Question

Working on a new website scraper and am having problems getting the text between the Div.在一个新的网站抓取工具上工作，并且在获取 Div 之间的文本时遇到问题。 I've tried.text and.strip() but still can't get the text.我试过 .text 和 .strip() 但仍然无法获取文本。 Any suggestions?有什么建议么？

URL = 'https://preview.mcassessor.maricopa.gov/mcs/?q=504-39-014'
header ={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
page = requests.get(URL, header)
soup = BeautifulSoup(page.content, 'html.parser')
value = soup.find("div", {"id": "Valuations_0_LimitedPropertyValue"})
print (value.text.strip())

Answer 1

The data is loaded from external source via JavaScript.数据通过 JavaScript 从外部源加载。 To load it, use this example:要加载它，请使用以下示例：

import re
import json
import requests
from bs4 import BeautifulSoup


url = "https://preview.mcassessor.maricopa.gov/mcs/?q=504-39-014"
api_url = "https://preview.mcassessor.maricopa.gov/parcel/{}/valuations/"

id_ = "".join(re.findall(r"\d+", url))

with requests.session() as s:
    soup = BeautifulSoup(s.get(url).content, "html.parser")
    data = s.get(
        api_url.format(id_),
        headers={"Authorization": soup.select_one("#Token")["value"]},
    ).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

# print some data:
for d in data:
    print(
        "{:<10} {:<10} {}".format(
            d["TaxYear"], d["FullCashValue"], d["LegalClassification"]
        )
    )

Prints:印刷：

2022       800        AG / VACANT LAND / NON-PROFIT R/P
2021       800        AG / VACANT LAND / NON-PROFIT R/P
2020       800        AG / VACANT LAND / NON-PROFIT R/P
2019       800        AG / VACANT LAND / NON-PROFIT R/P
2018       1000       AG / VACANT LAND / NON-PROFIT R/P
2017       1200       AG / VACANT LAND / NON-PROFIT R/P

BeautifulSoup 无法在 Div 之间获取文本

问题描述

1 个解决方案

解决方案1
0 2021-04-26 21:07:02

BeautifulSoup 无法在 Div 之间获取文本

问题描述

1 个解决方案

解决方案1 0 2021-04-26 21:07:02

解决方案1
0 2021-04-26 21:07:02