Beautiful Soup only returning the first 10 listings using soup.select(), What could be the issue here?

Question

import requests
import lxml
from bs4 import BeautifulSoup

LISTINGS_URL = 'https://shorturl.at/ceoAB'
headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/95.0.4638.69 Safari/537.36 ",
        "Accept-Language": "en-US,en;q=0.9"
}

response = requests.get(LISTINGS_URL, headers=headers)
listings = response.text


class DataScraper:
    def __init__(self):
        self.soup = BeautifulSoup(listings, "html.parser")
def get_links(self):
    for a in self.soup.select(".list-card-top a"):
        print(a)
    # listing_text = [link.getText() for link in links]

def get_address(self):
    pass

def get_prices(self):
    pass

I Have Used the correct css selectors, even trying to find the elements using attrs in find_all() What I am trying to achieve is to parse in all the anchor tags then to fetch the href links for the specific listings however it is only returning the first 10

Answer 1

You can make a GET request to this endpoint and fetch the data you need.

https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState={"pagination":{"currentPage":1},"mapBounds":{"west":-123.33522421253342,"east":-121.44008261097092,"south":37.041584214606814,"north":38.39290664366326},"isMapVisible":false,"filterState":{"price":{"max":872627},"beds":{"min":1},"isForSaleForeclosure":{"value":false},"monthlyPayment":{"max":3000},"isAuction":{"value":false},"isNewConstruction":{"value":false},"isForRent":{"value":true},"isForSaleByOwner":{"value":false},"isComingSoon":{"value":false},"isForSaleByAgent":{"value":false}},"isListVisible":true,"mapZoom":9}&wants={"cat1":["listResults"]}

Change the "currentPage" url parameter value in the above URL to fetch data from different pages.

Since the response is JSON , you can easily parse it and extract the information using json module.

Answer 2

Website is using probably lazy loading , so you can either use something like selenium/puppeteer or use an API of this website (will be an easier way). To do this you need to make a GET request to an url which starts with https://www.zillow.com/search/GetSearchPageState.htm (see in your dev tools in browser), parse JSON response and you have your href link under cat1.searchResults.listResults[index in array].detailUrl .

Beautiful Soup only returning the first 10 listings using soup.select(), What could be the issue here?

Question

2 answers

solution1
1 2021-11-05 07:56:39

solution2
0 2021-11-05 06:20:35

Beautiful Soup only returning the first 10 listings using soup.select(), What could be the issue here?

Question

2 answers

solution1 1 2021-11-05 07:56:39

solution2 0 2021-11-05 06:20:35

solution1
1 2021-11-05 07:56:39

solution2
0 2021-11-05 06:20:35