[英]Selenium output different from browser. can't find table
Trying to find the "query results" table on this page: https://dune.com/queries/604672/1128586试图在此页面上找到“查询结果”表: https://dune.com/queries/604672/1128586
using使用
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from random import randint
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36"
headers = {'User-Agent': USER_AGENT}
url="https://dune.com/queries/604672/1128634"
chromedriver_path = "E:\\py\\testsoup\\win\\chromedriver.exe"
service = Service(chromedriver_path)
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome(service=service,chrome_options=options)
driver.get(url)
sleep(randint(3, 5))
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
f = open(".\\url.html", "w", errors="ignore")
f.write(soup.prettify())
f.close()
print(soup.prettify())
table = soup.find("table",{"class":"table_table__fuS_N"})
#df = pd.read_html(str(table))[0]
print(table)
However I've identified that code in actual browser (brave) div aria-labelledby="tabs--2-但是我已经在实际浏览器(勇敢)div aria-labelledby="tabs--2-
Is different than what selenium fetches.不同于 selenium 获取的内容。 selenium doesn't fetch the table tab. selenium 不获取表格选项卡。 div aria-labelledby="tabs--1- div aria-labelledby="tabs--1-
any ideas how to get the table?任何想法如何获得桌子?
In my case, the following code is working without any issues就我而言,以下代码可以正常工作
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://dune.com/queries/604672/1128586'
driver.get(url)
driver.maximize_window()
time.sleep(5)
soup=BeautifulSoup(driver.page_source, 'lxml')
table = soup.select_one('table[class="table_table__fuS_N"]')
df = pd.read_html(str(table))[0]
print(df)
Output: Output:
first_time new_users cumu_users
0 2022-06-24 00:00 13617 13617
1 2022-06-25 00:00 11836 25453
2 2022-06-26 00:00 9381 34834
3 2022-06-27 00:00 7553 42387
4 2022-06-28 00:00 7576 49963
5 2022-06-29 00:00 6988 56951
6 2022-06-30 00:00 6027 62978
7 2022-07-01 00:00 5069 68047
8 2022-07-02 00:00 6908 74955
9 2022-07-03 00:00 5055 80010
10 2022-07-04 00:00 4055 84065
11 2022-07-05 00:00 3914 87979
12 2022-07-06 00:00 3456 91435
13 2022-07-07 00:00 3322 94757
14 2022-07-08 00:00 2589 97346
15 2022-07-09 00:00 2906 100252
16 2022-07-10 00:00 2751 103003
17 2022-07-11 00:00 2890 105893
18 2022-07-12 00:00 2916 108809
19 2022-07-13 00:00 3868 112677
20 2022-07-14 00:00 2974 115651
21 2022-07-15 00:00 1955 117606
22 2022-07-16 00:00 2474 120080
23 2022-07-17 00:00 2539 122619
24 2022-07-18 00:00 3260 125879
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.