简体   繁体   English

Python 嵌套跨度上的美丽汤解析错误

[英]Python Beautiful Soup parsing error on nested span

I am trying to parse the HTML using python Beautiful Soup.我正在尝试使用 python Beautiful Soup 解析 HTML。

The part of the HTML is shown below: HTML部分如下图所示:

<div class="zsg-lg-1-3 zsg-md-1-1 zsg-sm-1-1 value-info-block" id="yui_3_18_1_1_1587802421734_2498">
    <div id="overview" class="scroll-track" data-label="Market Overview"></div>
    <h2 id="yui_3_18_1_1_1587802421734_2509">San Francisco Market Overview</h2>


    <h6 class="zsg-fineprint hdr-date">Data through Mar 31, 2020</h6>
    <ul class="value-info-list" id="yui_3_18_1_1_1587802421734_2497">


        <li id="yui_3_18_1_1_1587802421734_2496">
            <span class="value" id="yui_3_18_1_1_1587802421734_2495">
                $1,310,500
            </span>

            <span class="info zsg-fineprint" id="yui_3_18_1_1_1587802421734_2524"> Median listing price

                <span class="info zsg-fineprint">(Jan&nbsp;31,&nbsp;2020)</span>

            </span>

        </li>

    </ul>

</div>

The python code to extract the HTML is as below:提取 HTML 的 python 代码如下:

def process_market_overview(self):
        parent = self.page_soup.find("div", {"data-label": "Market Overview"}).parent
        for li in parent.findAll("li"):
            value = li.find("span", {"class": "value"}, recursive=False).text.strip()
            key = li.find("span", {"class": "info zsg-fineprint"}, recursive=False).text
            key = key[0].strip()
            print(" key :{} , value:{}".format(key, value))

But the output that I am getting is wrong.但是我得到的 output 是错误的。 How do I parse in this kind of scenario?在这种情况下如何解析? Output is: Output 是:

key : , value:$1,447,191
 key : , value:-2.3%
 key : , value:$1,310,500
 key : , value:$1,364,300

What I want is to extract the value $1,310,500 and the key Median listing price from the HTML.我想要的是从 HTML 中提取价值$1,310,500和关键的Median listing price

URL: https://www.zillow.com/sanfrancisco-ca/home-values/ URL: https://www.zillow.com/sanfrancisco-ca/home-values/

Let me know if there is any better way of parsing it.让我知道是否有更好的解析方法。

For complete code, you can visit the link: https://github.com/srth12/Eclipse-Workspace-/blob/master/ariya_python_scrapping/zillow_scrapper.py完整代码可以访问链接: https://github.com/srth12/Eclipse-Workspace-/blob/master/ariya_python_scrapping/zillow_scrapper.py

Use the find method to extract the text from the parent span :使用find方法从父span中提取文本:

key = li.find("span", {"class": "info zsg-fineprint"}).find(text=True, recursive=False).strip()

Not a direct answer, but zillow provides a free api and downloadable csv's with the data that you are trying to scrape:不是直接的答案,但 zillow 提供了免费的 api和可下载的 csv,其中包含您尝试抓取的数据:

https://www.zillow.com/research/data/ https://www.zillow.com/research/data/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM