簡體   English   中英

Web 搜索后使用 Python、Selenium、ZC2ED0329D2D3CF54C78317B20D 進行搜索

[英]Web scrape after search with Python, Selenium, BeautifulSoup

在輸入所有必要的信息后,我想在 web 刮一張高中匯總表。 但是,我不知道該怎么做,因為 url 在進入學校頁面后並沒有改變。 我沒有找到任何與我正在嘗試做的事情相關的事情。 知道如何在完成搜索過程后刮一張桌子嗎? 謝謝你。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Chrome("drivers/chromedriver")

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

state_drop = driver.find_element_by_id("state")
state = Select(state_drop)
state.select_by_visible_text(input("New Jersey"))

driver.find_element_by_id("city").send_keys(input("Galloway"))
driver.find_element_by_id("name").send_keys(input("Absegami High School"))
driver.find_element_by_class_name("forms_input_button").send_keys(Keys.RETURN)
driver.find_element_by_id("hsSelectRadio_1").click()

url = driver.current_url
print(url)
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
school_info = soup.find('table', class_="border=")
print(school_info)

嘗試這個:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

state_drop = driver.find_element_by_id("state")
state = Select(state_drop)
state.select_by_visible_text("New Jersey")

driver.find_element_by_id("city").send_keys("Galloway")
driver.find_element_by_id("name").send_keys("Absegami High School")
driver.find_element_by_class_name("forms_input_button").send_keys(Keys.RETURN)
driver.find_element_by_id("hsSelectRadio_1").click()

#scraping the caption of the tables
all_sub_head = driver.find_elements_by_class_name("tableSubHeaderForWsrDetail") 

#scraping all the headers of the tables
all_headers = driver.find_elements_by_class_name("tableHeaderForWsrDetail")

#filtering the desired headers
required_headers = all_headers[5:]

#scraoing all the table data
all_contents = driver.find_elements_by_class_name("tdTinyFontForWsrDetail")

#filtering the desired tabla data
required_contents = all_contents[45:]
    
print("                ",all_sub_head[1].text,"                ")
for i in range(15):
    print(required_headers[i].text, "              >     ", required_contents[i].text )
    
print("execution completed")

OUTPUT

                 High School Summary                 
NCAA High School Code               >      310759
CEEB Code               >      310759
High School Name               >      ABSEGAMI HIGH SCHOOL
Address               >      201 S WRANGLEBORO RD
GALLOWAY
NJ - 08205
Primary Contact Name               >      BONNIE WADE
Primary Contact Phone               >      609-652-1485
Primary Contact Fax               >      609-404-9683
Primary Contact Email               >      bwade@gehrhsd.net
Secondary Contact Name               >      MR. DANIEL KERN
Secondary Contact Phone               >      6096521372
Secondary Contact Fax               >      6094049683
Secondary Contact Email               >      dkern@gehrhsd.net
School Website               >      http://www.gehrhsd.net/
Link to Online Course Catalog/Program of Studies               >      Not Available
Last Update of List of NCAA Courses               >      12-Feb-20
execution completed

output 截圖:點我!!!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM