當單擊特定的 onclick 按鈕時，從 URL 不會更改的網站中抓取數據

Question

網站： http ://www.busonlineticket.com/booking/singapore-to-shah-alam-bus-tickets

基本上我正在嘗試從該網站上抓取有關巴士行程的數據，但我抓取的數據取決於用戶選擇我的網絡應用程序的日期。

有誰知道我如何編寫一個可以從<li class='liDaysNew'>標簽中抓取數據的程序

我當前的代碼是這樣的：

page = requests.get(bus_URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
buses = soup.find_all('tr', class_='bustr1')

bus_companies = []
depart_times = []
departure_locations = []
arrival_locations = []
prices = []

for bus in buses:
    bus_company = bus.find('span', class_='buscompanyname').text
    depart_time = bus.find('span', class_='bustime').text

    departure_location_div = bus.find('div', class_='mbuspickup1')
    departure_location = departure_location_div.find('span').text

    arrival_location_div = bus.find('div', class_='mbusdropoff1')
    arrival_location = arrival_location_div.find('span').text

    price = bus.find('price', class_='mbusprice1').text

我知道如何通過網絡抓取一個普通的網站，它只是<li class='liDaysNew'>帶有 onclick 邏輯的標簽讓我失望。

Answer 1

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'}
data = {'deptdate': '2022-07-16', ## your departure date
        'rtndate': '2022-07-18', ## your return date
        'pax': '1',
        'way': '1',
        'type': 'bus',
        'sbf': 'undefined'}
r = requests.post("https://www.busonlineticket.com/booking/singapore-to-shah-alam-bus-tickets", headers=headers, data=data)
soup = BeautifulSoup(r.text)
table_div = soup.select_one('#subtab1')
tables = pd.read_html(str(table_div))
df = tables[0]
print(df)

這將返回一個熊貓數據框：

當然，您可以使用 BeautifulSoup 以不同於數據框的方式提取數據，例如單獨的表格元素等。

當單擊特定的 onclick 按鈕時，從 URL 不會更改的網站中抓取數據

問題描述

1 個解決方案

解決方案1
0 2022-07-15 10:59:14

當單擊特定的 onclick 按鈕時，從 URL 不會更改的網站中抓取數據

問題描述

1 個解決方案

解決方案1 0 2022-07-15 10:59:14

解決方案1
0 2022-07-15 10:59:14