[英]How to scrape content from a dynamic table using python?
我正在嘗試提取此頁面上“振盪器”選項卡下的 RSI 指標。
URL: https://in.tradingview.com/markets/stocks-india/market-movers-active/
我知道我必須先使用 Selenium 之類的東西才能訪問選項卡,但是如何訪問“振盪器”div。
我需要使用 selenium,然后我可以使用 beautiful-soup 找到正確的標簽和數據,對吧?
編輯 -
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from time import sleep
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
# create object for chrome options
chrome_options = Options()
base_url = 'https://in.tradingview.com/markets/stocks-india/market-movers-active/'
# To disable the message, "Chrome is being controlled by automated test software"
chrome_options.add_argument("disable-infobars")
# Pass the argument 1 to allow and 2 to block
chrome_options.add_experimental_option("prefs", {
"profile.default_content_setting_values.notifications": 2
})
# invoke the webdriver
browser = webdriver.Chrome(executable_path = r'/Users/judhjitganguli/Downloads/chromedriver',
options = chrome_options)
browser.get('chrome://settings/')
browser.execute_script('chrome.settingsPrivate.setDefaultZoom(0.5);')
browser.get(base_url)
delay = 5 #seconds
while True:
try:
# find tab/button
osiButton = browser.find_element_by_css_selector('.tv-screener-toolbar__favorites div div div:nth-child(8)')
print('button text: ' + osiButton.text)
osiButton.click()
WebDriverWait(browser, 9).until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, 'th:nth-child(2) .js-head-title'), "OSCILLATORS RATING"))
# table updated, get the data
for row in browser.find_elements_by_css_selector(".tv-data-table__tbody tr"):
print(row.text)
#for cell in browser.find_elements_by_css_selector('td'):
# print(cell.text)
except Exception as ex:
print(ex)
# close the automated browser
browser.close()
在 output 中,我得到了所需的數據,但它是一個無限循環。 如何將其放入 pandas df?
單擊 Oscillators 后,等待並監視元素th:nth-child(2).js-head-title
的變化,使用 WebDriverWait 從Last
到Oscillators Rating
# if running headless make sure to add this argument
# or the oscillators tab will not visible or can't be clicked
#chrome_options.add_argument("window-size=1980,960");
try:
# find tab/button
osiButton = driver.find_element_by_css_selector('.tv-screener-toolbar__favorites div div div:nth-child(8)')
print('button text: ' + osiButton.text)
osiButton.click()
WebDriverWait(driver, 9).until(
EC.text_to_be_present_in_element((By.CSS_SELECTOR, 'th:nth-child(2) .js-head-title'), "OSCILLATORS RATING"))
# table updated, get the data
for row in driver.find_elements_by_css_selector('.tv-data-table__tbody tr'):
print(row.text)
#for cell in driver.find_elements_by_css_selector('td'):
#print(cell.text)
except Exception as ex:
print(ex)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.