简体   繁体   English

无法从网站获取网络抓取中的所有信息

[英]Not getting all information in web scraping from website

I am taking the details of property from a website but it hardly gives the information of 20 property while there are 100. There is no timeout我正在从一个网站上获取财产的详细信息,但它几乎没有提供 20 个财产的信息,而有 100 个。没有超时

INPUT:输入:

import requests
import pandas
from bs4 import BeautifulSoup

r=requests.get('https://www.century21.com/real-estate/new-york-ny/LCNYNEWYORK/')
c=r.content

soup=BeautifulSoup(c,'html.parser')

all=soup.find_all("div",{'class':'property-card-primary-info'})
#print(soup.prettify())




#len(all)
l=[]
for item in all:
    d={}
    d['price']=item.find('a',{'class':'listing-price'}).text.replace('\n','').replace(' ','')
    add=item.find('div',{'class':'property-address-info'})
    try:
        d['address']=add.text.replace('\n',' ').replace('  ','')
    except:
        d['address']="None"
    try:
        d['beds']=item.find('div',{'class':'property-beds'}).find('strong').text.replace('\n','')
    except:
        d['beds']='None'
    try:
        d['baths']=item.find('div',{'class':'property-baths'}).find('strong').text.replace('\n','')
    except:
        d['baths']='None'
    try:
        d['area']=item.find('div',{'class':'property-sqft'}).find('strong').text
    except:
        d['area']='None'
    l.append(d)

df=pandas.DataFrame(l)
print(df)

OUTPUT:输出:

          price                                            address beds baths   area
0    $1,680,000        161 West 61st Street 3-F New York NY 10023     2     2   None
1    $1,225,000        350 East 82nd Street 2-J New York NY 10028     2     2   None
2    $2,550,000   845 United Nations Plaza 39-E New York NY 10017     2     2   None
3    $1,850,000            57 Reade Street 17-C New York NY 10007     1     1   None
4      $828,000              80 Park Avenue 4-E New York NY 10016     1     1   None
5      $850,000        635 West 42nd Street 19L New York NY 10036     1     1   None
6    $1,749,000        635 West 42nd Street 45D New York NY 10036     2     2   None
7    $1,175,000       340 East 64th Street 11-P New York NY 10065     2     1   None
8    $5,450,000      450 East 83rd Street 24-BC New York NY 10028     5     5   None
9    $4,500,000     524 East 72nd Street 32-CDE New York NY 10021     3     3   None
10   $1,700,000        635 West 42nd Street 42E New York NY 10036     1     1   None
11     $850,000       635 West 42nd Street 15JJ New York NY 10036     1     1   None
12     $800,000       635 West 42nd Street 16JJ New York NY 10036     1     1   None
13  $22,500,000        635 West 42nd Street 28K New York NY 10036     6     6  6,000
14   $1,125,000        635 West 42nd Street 15G New York NY 10036     1     1   None
15   $1,085,000        635 West 42nd Street 14N New York NY 10036     1     1    800
16     $900,000        635 West 42nd Street 18E New York NY 10036     1     1   None
17   $1,600,000        635 West 42nd Street 23K New York NY 10036     2     2  1,070
18   $1,250,000        635 West 42nd Street 24H New York NY 10036     2     1    800
19     $995,000         635 West 42nd Street 4F New York NY 10036     1     1    800

But there are 100 property details on website why I am getting only 20. Is there any way so that I can get the all property details.但是网站上有 100 个财产详细信息,为什么我只得到 20 个。有什么方法可以让我获得所有财产详细信息。

That page shows only the first 20 items at the beginning.该页面仅显示开头的前 20 项。 Upon scrolling, the next 20 items are shown.滚动时,将显示接下来的 20 个项目。 And that is the reason you are getting only the first 20 items.这就是您只获得前 20 个项目的原因。

You could instead scrape from this URL.你可以从这个 URL 中抓取。

https://www.century21.com/propsearch-async?lid=CNYNEWYORK&t=0&s=0&r=20&searchKey=244edb9b-0c67-41cc-aa75-1125928b7c87&p=1&o=comingsoon-asc

Change the s value in multiples of 20.以 20 的倍数更改s值。

Example,例子,

s=0: The first 20 items will be fetched
s=20: The next 20 items will be fetched
s=40: .....
and so on...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM