[英]web scraping returns 'None'
我是 python 的新手,我正在嘗試構建一個網絡抓取算法。
我正在嘗試抓取“href”網址:
我的代碼:
URL = 'https://www.rotowire.com/basketball/team.php?team=UTA'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
service = Service(ChromeDriverManager().install())
for link in soup.find_all({"aria-colindex" : "3"}):
print(link.get('href'))
driver = webdriver.Chrome(service = service)
但這沒有任何回報。 我也試過 {'style' : "width: 96px; left: 190px; top: 0px;"} insteed of {"aria-colindex" : "3"},但這也返回 'None'。 不知道我做錯了什么,所以任何幫助將不勝感激:)
數據是從api動態加載的。 直接從 api 檢索鏈接更容易。 這是一個pandas
實現:
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_json('https://www.rotowire.com/basketball/tables/team-schedule.php?team=UTA')
df['url'] = df['score'].apply(lambda x: BeautifulSoup(x).find('a')['href'])
df.to_csv('output.csv') #export to csv
根據你的問題。 這是工作解決方案。
代碼:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
url = "https://www.rotowire.com/basketball/team.php?team=UTA"
driver.get(url)
time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'html.parser')
urls = soup.select('div.webix_column.align-c div a')
for url in urls:
print('href_url:' +url['href'])
輸出:
href_url:/basketball/box-score.php?gid=2347768
href_url:/basketball/box-score.php?gid=2347767
href_url:/basketball/box-score.php?gid=2347765
href_url:/basketball/box-score.php?gid=2347764
href_url:/basketball/box-score.php?gid=2347762
href_url:/basketball/box-score.php?gid=2347760
href_url:/basketball/box-score.php?gid=2346563
href_url:/basketball/box-score.php?gid=2346562
href_url:/basketball/box-score.php?gid=2346561
href_url:/basketball/box-score.php?gid=2346420
href_url:/basketball/box-score.php?gid=2346295
href_url:/basketball/box-score.php?gid=2314246
href_url:/basketball/box-score.php?gid=2314315
href_url:/basketball/box-score.php?gid=2314159
href_url:/basketball/box-score.php?gid=2314155
href_url:/basketball/box-score.php?gid=2314153
href_url:/basketball/box-score.php?gid=2314144
href_url:/basketball/box-score.php?gid=2314220
href_url:/basketball/box-score.php?gid=2314333
href_url:/basketball/box-score.php?gid=2314142
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.