[英]Python WebScraping With Selenium&gChrome
我正在尝试抓取网页,但通过 class 名称查找元素不起作用。 我可以在 Chrome 的 Elements 面板中看到元素的 class 名称,当输入该名称时,如下所示,它返回一个空结果。
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
usernames = driver.find_elements_by_class_name("md-cell leaderboard-row")
usernames
我正在尝试使用此排行榜页面至少刮取用户名和他们的积分,进一步的计划是还记下他们的 position 并将其输入到 excel 电子表格中,但那是在未来,而不是我在此时此刻。
我从运行“用户名”中看到的 output 是“[]”,我知道这意味着它是空的,但我不明白为什么如果我能看到该元素并且它是 class 名称并且它完全相同。 一定是遗漏了什么,或者有什么我不知道的。
编辑: go 到底部查看获取数据的更好方法,在这种情况下不必从 html
得到它的工作:只需等待 10 秒,只搜索一个 class 名称:
import time
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe" # or wherever you have your chrome webdriver installed
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
# let the page load
time.sleep(10)
# list comprehension to return text of each element with class leaderboard-row
usernames = [element.text for element in
driver.find_elements_by_class_name("leaderboard-row")
if element.text != '']
print(usernames)
Output:
['underholderen', '42051', 'jimbyj', '39220', 'delynne', '35411', 'rawrnerunya', '30350', 'simmer5k', '25470', 'bloomspeed', '23885', 'jaidav2000', '22386', 'moobot', '18910', 'virgoproz', '18120', 'ottermandela', '18108', 'v_and_k', '17945', 'kalibxi', '17610', 'commanderroot', '17585', 'jujusan', '17575', 'mellowj', '15390', 'itsvodoo', '15080', 'lord_hal', '14945', 'darkk0ala', '14757', 'sirenmatty', '13230', 'myles_27', '12725', 'upsetpoptart', '12204', 'salsichasensuaal', '11535', 'artalartistic', '11519', 'shannonmcbe', '10895', 'winsock', '10850']
如果您想从表中的其他列中获取数据,这也是可能的
编辑:
更好的是,我能够获得 XHR web 请求以返回顶级观众列表(这是表中数据的来源,并且采用 json 格式): Z5E056C500A1C4B6A71110B50D807.BADE5///2streamelements.com/kADEapp /points/5cf5740dc3334beee6ba64a6/顶部
您可以查询它并更快地获取数据而无需抓取,让我知道,我可以展示如何。
编辑:
好的,超级简单,WAAAAAAY 更好:
首次安装要求:
pip install requests
然后:
import json
import requests
url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'
# get a dictionary of the request's json response
usernames = requests.get(url).json()
print(usernames)
Output:
{'_total': 19350, 'users': [{'username': 'underholderen', 'points': 42051}, {'username': 'jimbyj', 'points': 39220}, {'username': 'delynne', 'points': 35411}, {'username': 'rawrnerunya', 'points': 30350}, {'username': 'simmer5k', 'points': 25470}, {'username': 'bloomspeed', 'points': 23885}, {'username': 'jaidav2000', 'points': 22386}, {'username': 'moobot', 'points': 18910}, {'username': 'virgoproz', 'points': 18120}, {'username': 'ottermandela', 'points': 18108}, {'username': 'v_and_k', 'points': 17945}, {'username': 'kalibxi', 'points': 17610}, {'username': 'commanderroot', 'points': 17585}, {'username': 'jujusan', 'points': 17575}, {'username': 'mellowj', 'points': 15390}, {'username': 'itsvodoo', 'points': 15080}, {'username': 'lord_hal', 'points': 14945}, {'username': 'darkk0ala', 'points': 14757}, {'username': 'sirenmatty', 'points': 13230}, {'username': 'myles_27', 'points': 12725}, {'username': 'upsetpoptart', 'points': 12204}, {'username': 'salsichasensuaal', 'points': 11535}, {'username': 'artalartistic', 'points': 11519}, {'username': 'shannonmcbe', 'points': 10895}, {'username': 'winsock', 'points': 10850}, {'username': 'macklelotsmore', 'points': 10688}, {'username': 'kikyobooty', 'points': 10650}, {'username': 'jovikingdomkey', 'points': 10385}, {'username': 'dancerhands', 'points': 10186}, {'username': 'mapplerug45', 'points': 10185}, {'username': 'lurxx', 'points': 10175}, {'username': 'jellycat101', 'points': 9965}, {'username': 'dean_', 'points': 9880}, {'username': 'tagou_', 'points': 9550}, {'username': 'arthiphix', 'points': 9505}, {'username': 'beingred', 'points': 9307}, {'username': 'theemrmark', 'points': 9135}, {'username': 'tiptactoe', 'points': 8710}, {'username': 'aten', 'points': 8660}, {'username': 'sweegol', 'points': 8630}, {'username': 'taramichellee', 'points': 8625}, {'username': 'sindar44', 'points': 8590}, {'username': 'nitestalkrr', 'points': 8570}, {'username': 'swoapy', 'points': 8546}, {'username': 'logviewer', 'points': 8380}, {'username': 'umental', 'points': 8235}, {'username': 'chesterfield250', 'points': 8171}, {'username': 'theedgecution', 'points': 8152}, {'username': 'dreameater_gd', 'points': 8110}, {'username': 'camirios29', 'points': 7960}, {'username': 'dirty_soul', 'points': 7895}, {'username': 'princesschango', 'points': 7780}, {'username': 'tylerhunsicker', 'points': 7729}, {'username': 'toonybit', 'points': 7655}, {'username': 'angeloflight', 'points': 7515}, {'username': 'fentondy', 'points': 7325}, {'username': 'owgrandma', 'points': 7165}, {'username': 'ohitspb', 'points': 7150}, {'username': 'jayy557', 'points': 7140}, {'username': 'nightbot', 'points': 7125}, {'username': 'therealjt', 'points': 7110}, {'username': 'hawqks', 'points': 6970}, {'username': 'oxsaucy', 'points': 6930}, {'username': 'somoonm', 'points': 6910}, {'username': 'skiesti', 'points': 6890}, {'username': 'adeeduhs', 'points': 6695}, {'username': 'elmolovesdorothy', 'points': 6660}, {'username': 'liquigels', 'points': 6640}, {'username': 'shadowed21', 'points': 6630}, {'username': 'fakerwtd', 'points': 6450}, {'username': 'fragglefusion', 'points': 6440}, {'username': 'kickypip', 'points': 6230}, {'username': 'cerem5', 'points': 6230}, {'username': 'nikkigsus', 'points': 6225}, {'username': 'bigj808', 'points': 6135}, {'username': 'anotherttvviewer', 'points': 6070}, {'username': 'taratv', 'points': 6040}, {'username': 'l0nnix', 'points': 5970}, {'username': 'sainttt', 'points': 5965}, {'username': 'princejay__', 'points': 5905}, {'username': 'oniisammma', 'points': 5886}, {'username': 'marshallpawpatrol', 'points': 5839}, {'username': 'rosayallday', 'points': 5720}, {'username': 'garvsehgal98', 'points': 5700}, {'username': 'beethoven6', 'points': 5695}, {'username': 'nynxii', 'points': 5680}, {'username': 'tilly', 'points': 5672}, {'username': 'godgundam1019', 'points': 5615}, {'username': 'monoclekitteh', 'points': 5605}, {'username': 'steviewondaaa', 'points': 5580}, {'username': 'ianonymoose', 'points': 5545}, {'username': 'aris1535', 'points': 5477}, {'username': 'rimastino', 'points': 5445}, {'username': 'kodexow', 'points': 5395}, {'username': 'ssondara', 'points': 5360}, {'username': 'cyroku', 'points': 5325}, {'username': 'ankoubzh', 'points': 5250}, {'username': 'sajan_ow', 'points': 5205}, {'username': 'plucik7', 'points': 5125}, {'username': 'sutetchi_', 'points': 5108}]}
编辑(再次):
以下是如何在 excel 中获取它(代码从上面略有更改):
首先安装openpyxl:
pip install openpyxl
然后运行脚本:
import json
import requests
import openpyxl as xl
url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'
# get a dictionary of the request's json response
response = requests.get(url).json()
# get just the user list
users = response['users']
# add the index + 1 as rank (because index starts at 0)
for user in users:
user['rank'] = users.index(user) + 1
# create the workbook
wb = xl.Workbook()
# go to the active sheet
ws = wb.active
# write the header row
ws.append(list(users[0].keys()))
# write the values for each row
for user in users:
ws.append(list(user.values()))
# save the workbook
wb.save('./streamelements-kappa.xlsx')
可能 class 名称不是“md-cell leaderboard-row”而是“md-cell” 空格后面是一个选择器或类似的东西,老实说我不太了解,因为我对 CSS 几乎一无所知.
但是,这段代码应该可以正常工作:
chrome_path = r"D:/PythonLessons/imageTest/chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
usernames = driver.find_elements_by_class_name("md-cell")
for item in usernames:
print(item.text)
driver.close()
在该代码中,您将获得所有 md-cells,您将看到一个包含所有单元格的列表,您还可以使用“md-row”而不是“md-cells”来获取行,您将获得一个列表和每个元素都是一行,包含数字、名称和点数。 试试看
Ps:如果元素为空,您可以在获得列表后检查。
这是因为您正在寻找的元素是多个类的一部分,即md-cell
和leaderboard-row
。 要解决此问题,请使用 xpath 查找元素是md-cell
class 和leaderboard-row
class 的一部分的元素:
usernames = driver.find_elements_by_xpath("//*[contains(@class, 'md-cell') and contains(@class, 'leaderboard-row')]")
如果在页面完全加载之前执行行,请确保添加睡眠
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.