简体   繁体   English

Python / BeautifulSoup与JavaScript源码

[英]Python/BeautifulSoup with JavaScript source

First of all, I am new to Python and BeautifulSoup. 首先,我是Python和BeautifulSoup的新手。 So forgive me if I am using the wrong terminology. 如果我使用错误的术语,请原谅我。

I am encountering an issue where when I inspect the element, I was able to find it, but when I go to 'view source', it wasn't there, and it seems that data was pulled via javascript and thus it may be dynamic. 我遇到一个问题,当我检查元素时,我能够找到它,但当我去“查看源代码”时,它不存在,而且似乎数据是通过javascript拉取的,因此它可能是动态的。

My question is thus, how do I incorporate the data(source/elements/tag) that's 'uploaded' by javascript? 因此,我的问题是,如何整合javascript“上传”的数据(来源/元素/标签)?

So far, I have the code below. 到目前为止,我有以下代码。 I wasn't able to get the URL for each 'search' 我无法获取每个“搜索”的网址

import urllib
import urllib.request
from bs4 import BeautifulSoup
import csv

rootURL="http://www.homestead.ca"

def HomeStead2(URL):
    thePage = urllib.request.urlopen(URL)
    soup = BeautifulSoup(thePage, "html.parser")
    return soup

soup = HomeStead2(rootURL)

for dropdownlist in soup.find("ul", {"class":"nav navbar-nav primary"}).find('ul').findAll('a'):

"""NOTHING IS WORKING FROM HERE ONWARDS WHEN I TRY TO GET THE HREF"""
    citySoup = HomeStead2(rootURL + dropdownlist.get('href'))
    for btnPreview in citySoup.find("div", {"class":"search extended-search"}).findAll('li'):
        try:
            for ApartmentLink in btnPreview.findAll("div", {"class":"property-container"}):
                print(ApartmentLink)
        except:
            print('skip')

在此输入图像描述

You can do it all without selenium, once you visit each apartment url the data is retrieved from an ajax call to an api, all we need is the city-id : 你可以在没有selenium的情况下完成所有工作,一旦你访问每个公寓网址,就会从ajax调用api中检索数据,我们所需要的只是city-id

from bs4 import BeautifulSoup
from urllib.parse import urljoin

root = "http://www.homestead.ca"

data = {'keyword': 'false', 'max_bed': '100', 'geocode': '',
        'min_rate': '0', 'offset': '0', 'max_rate': '4000',
        'show_custom_fields': 'true', 'limit': '50', ''
                                                     'pet_friendly': '', 'city_id': '', 'amenities': '',
        'client_id': '6', 'max_bath': '10',
        'auth_token': 'sswpREkUtyeYjeoahA2i',
        'count': 'false', 'min_bath': '0',
        'order': 'max_rate ASC, min_rate ASC, min_bed ASC, max_bath ASC',
        'city_ids': '', 'region': '',
        'property_types': 'low-rise-apartment,mid-rise-apartment,high-rise-apartment,luxury-apartment,townhouse,house,multi-unit-house,single-family-home,duplex,tripex,semi',
        'min_bed': '-1',
        'show_promotions': 'true'}

get = "http://api.theliftsystem.com/v2/search"
with requests.Session() as s:
    r = s.get(root)
    soup = BeautifulSoup(r.content, "lxml")
    lis = soup.select("ul.child-pages.dropdown-menu li")
    for li in lis:
        city_id = li["data-city-id"]
        data["city_id"] = city_id
        p = s.get(get, params=data)
        print(p.json())

You can modify the data to match whatever query you want. 您可以修改数据以匹配您想要的任何查询。

The output will be in json format like: 输出将采用json格式,如:

[{'building_header': '', 'office_hours': '', 'name': 'North Park Tower', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 309, 'statistics': {'suites': {'rates': {'average': 950.0, 'max': 1275.0, 'min': 625.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2605725', 'latitude': '43.1703624', 'distance': None}, 'photo': '1443018148_2.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '325 North Park Street', 'postal_code': 'N3R 2X4', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/325-north-park-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443018148_2.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': "Located on North Park Street and Memorial Avenue,this quiet building is within walking distance of the following: - Zehrs Plaza, North Park Plaza, Shoppers Drug Mart, Zehrs Grocery Store, Zellers, Pet Store, Party Supply Store, furniture store, variety store, Black's Photography, paint shop and veterinary clinic\xa0  - Restaurants and coffee shops\xa0  - Wayne Gretzky Recreational Arena\xa0  - Medical Clinic,Shoppers Home Health Care Clinic and Pharmacy\xa0  - Catholic Elementary School\xa0  - On bus route "}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443018148_2.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Westgate Apartments', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 310, 'statistics': {'suites': {'rates': {'average': 975.0, 'max': 1300.0, 'min': 650.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2482991', 'latitude': '43.1733242', 'distance': None}, 'photo': '1443017488_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '661 West Street', 'postal_code': 'N3R 6W9', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/661-west-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017488_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Located in the North end of Brantford, Westgate Tower is in an area that resembles a city within a city. There are a variety of banks, grocery stores, drug stores, malls, a wide selection of fast food, fine dining restaurants and an after hours medical centre, within waking distance.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 751-0379', 'alt_phone': '', 'name': '', 'phone': '519-751-3867', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017488_1.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Dornia Manor', 'matched_suite_names': ['One Bedroom', 'Two Bedroom', 'Three Bedroom'], 'matched_beds': ['1', '2', '3'], 'id': 308, 'statistics': {'suites': {'rates': {'average': 1124.5, 'max': 1350.0, 'min': 899.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '2.25', 'max': 3, 'min': 1}, 'bathrooms': {'average': 1.375, 'max': 2.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2584034', 'latitude': '43.1706331', 'distance': None}, 'photo': '1443017947_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '321 Fairview Drive', 'postal_code': 'N3R 2X6', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/321-fairview-drive-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017947_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Dornia Manor is a quiet, ninety-two unit apartment building located in the North end of Brantford. We offer one, two and three bedroom units and one penthouse suite. The building is located in close proximity to many major services such as banking, shopping, health services, recreational facilities, beauty shops, dry cleaners, schools and churches. There is a bus stop at the front door and highway 403 is within minutes.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 8, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017947_1.jpg'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM