简体   繁体   English

Selenium output 与浏览器不同。 找不到表

[英]Selenium output different from browser. can't find table

Trying to find the "query results" table on this page: https://dune.com/queries/604672/1128586试图在此页面上找到“查询结果”表: https://dune.com/queries/604672/1128586

using使用

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from random import randint

USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36"
headers = {'User-Agent': USER_AGENT}
url="https://dune.com/queries/604672/1128634"
chromedriver_path = "E:\\py\\testsoup\\win\\chromedriver.exe"

service = Service(chromedriver_path)
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome(service=service,chrome_options=options)
driver.get(url)
sleep(randint(3, 5))


soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()


f = open(".\\url.html", "w", errors="ignore")
f.write(soup.prettify())
f.close()

print(soup.prettify())

table = soup.find("table",{"class":"table_table__fuS_N"})
#df = pd.read_html(str(table))[0]

print(table)

However I've identified that code in actual browser (brave) div aria-labelledby="tabs--2-但是我已经在实际浏览器(勇敢)div aria-labelledby="tabs--2-

在此处输入图像描述

Is different than what selenium fetches.不同于 selenium 获取的内容。 selenium doesn't fetch the table tab. selenium 不获取表格选项卡。 div aria-labelledby="tabs--1- div aria-labelledby="tabs--1-

在此处输入图像描述

any ideas how to get the table?任何想法如何获得桌子?

In my case, the following code is working without any issues就我而言,以下代码可以正常工作

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://dune.com/queries/604672/1128586'
driver.get(url)
driver.maximize_window()
time.sleep(5)

soup=BeautifulSoup(driver.page_source, 'lxml')
table = soup.select_one('table[class="table_table__fuS_N"]')
df = pd.read_html(str(table))[0]
print(df)

Output: Output:

          first_time  new_users  cumu_users
0   2022-06-24 00:00      13617       13617
1   2022-06-25 00:00      11836       25453
2   2022-06-26 00:00       9381       34834
3   2022-06-27 00:00       7553       42387
4   2022-06-28 00:00       7576       49963
5   2022-06-29 00:00       6988       56951
6   2022-06-30 00:00       6027       62978
7   2022-07-01 00:00       5069       68047
8   2022-07-02 00:00       6908       74955
9   2022-07-03 00:00       5055       80010
10  2022-07-04 00:00       4055       84065
11  2022-07-05 00:00       3914       87979
12  2022-07-06 00:00       3456       91435
13  2022-07-07 00:00       3322       94757
14  2022-07-08 00:00       2589       97346
15  2022-07-09 00:00       2906      100252
16  2022-07-10 00:00       2751      103003
17  2022-07-11 00:00       2890      105893
18  2022-07-12 00:00       2916      108809
19  2022-07-13 00:00       3868      112677
20  2022-07-14 00:00       2974      115651
21  2022-07-15 00:00       1955      117606
22  2022-07-16 00:00       2474      120080
23  2022-07-17 00:00       2539      122619
24  2022-07-18 00:00       3260      125879

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM