繁体   English   中英

Beautiful Soup 只使用soup.select() 返回前10 个列表,这可能是什么问题?

[英]Beautiful Soup only returning the first 10 listings using soup.select(), What could be the issue here?

import requests
import lxml
from bs4 import BeautifulSoup

LISTINGS_URL = 'https://shorturl.at/ceoAB'
headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/95.0.4638.69 Safari/537.36 ",
        "Accept-Language": "en-US,en;q=0.9"
}

response = requests.get(LISTINGS_URL, headers=headers)
listings = response.text


class DataScraper:
    def __init__(self):
        self.soup = BeautifulSoup(listings, "html.parser")
def get_links(self):
    for a in self.soup.select(".list-card-top a"):
        print(a)
    # listing_text = [link.getText() for link in links]

def get_address(self):
    pass

def get_prices(self):
    pass

我使用了正确的 css 选择器,甚至试图在 find_all() 中使用 attrs 查找元素我想要实现的是解析所有锚标记,然后获取特定列表的 href 链接,但它只返回前 10 个

您可以向该端点发出 GET 请求并获取您需要的数据。

https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState={"pagination":{"currentPage":1},"mapBounds":{"west":-123.33522421253342,"east":-121.44008261097092,"south":37.041584214606814,"north":38.39290664366326},"isMapVisible":false,"filterState":{"price":{"max":872627},"beds":{"min":1},"isForSaleForeclosure":{"value":false},"monthlyPayment":{"max":3000},"isAuction":{"value":false},"isNewConstruction":{"value":false},"isForRent":{"value":true},"isForSaleByOwner":{"value":false},"isComingSoon":{"value":false},"isForSaleByAgent":{"value":false}},"isListVisible":true,"mapZoom":9}&wants={"cat1":["listResults"]}

更改上述 URL 中的"currentPage" url 参数值以从不同页面获取数据。

由于响应是JSON ,您可以轻松解析它并使用json模块提取信息。

网站可能正在使用延迟加载,因此您可以使用 selenium/puppeteer 之类的东西或使用该网站的 API(将是一种更简单的方法)。 为此,您需要向以https://www.zillow.com/search/GetSearchPageState.htm开头的 url 发出GET请求(在浏览器中的开发工具中查看),解析 JSON 响应并获得您的href链接在cat1.searchResults.listResults[index in array].detailUrl

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM