[英]Scraping text; I'm not sure the Google Chrome Inspect element is giving me the correct XPath. Where can I get the correct path?
[英]I can't locate an element despite using correct CSS selector/XPATH and there is no iframe in html that I'm scraping. How do I get the element?
以下是我的完整代码供参考。 除了倒数第二行之外,一切正常,这就是我的问题。 这里是。
from selenium import webdriver
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import xlsxwriter
from datetime import datetime
import time
trade_date_lim = "4/10/2021"
chrome_driver = os.path.abspath('C:/Users/ross/Desktop/chromedriver.exe')
browser = webdriver.Chrome(chrome_driver)
browser.get('https://finra-markets.morningstar.com/BondCenter/Default.jsp')
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#TabContainer > div > div.rtq-tab-wrap > div.rtq-tab-menus-wrap > ul > li:nth-child(3) > a > span'))).click()
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#firscreener-cusip'))).send_keys("STWD")
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-advanced-search-form > div.ms-finra-advanced-search-btn > input:nth-child(2)"))).click()
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-agreement > input"))).click()
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-grid-hd > div > div:nth-child(7) > div"))).click()
time.sleep(2)
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-grid-hd > div > div:nth-child(7) > div"))).click()
time.sleep(2)
whole_chart = WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll"))).text
parent = browser.find_element_by_xpath('//*[@id="ms-finra-search-results"]/div/div[3]/div[1]/div[1]/div[2]/div[2]/div')
count_divs = len(parent.find_elements_by_xpath("./div"))
for row_num in range(1):
#gets values that I'm looking for
symbol = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(3)"))).text
maturity = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(7)"))).text
moody_rating = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(8)"))).text
sandp_rating = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(9)"))).text
bond_yield = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(11)"))).text
#looks to see if all values are non-empty and if moody rating and sandp rating are not equal to 'WR' and 'NR'
if symbol.strip() and maturity.strip() and moody_rating.strip() and sandp_rating.strip() and bond_yield.strip() and moody_rating != "WR" and sandp_rating != "NR":
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(2) > div > a"))).click()
WebDriverWait(browser, 5).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "ms-bond-detail-iframe")))
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#tradeHistory_link"))).click()
browser.switch_to.default_content()
time.sleep(10)
#bond information has everything we need. Now we check to see the last time this bond was actually traded
last_trade_date = WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#ms-glossary > div > table > tbody > tr:nth-child(1) > td:nth-child(1) > div')))
print(last_trade_date)
引发的错误是超时异常。
我尝试通过 CSS 选择器以及 XPATH 进行查找。 我相信,我对每条路径都使用了正确的格式。 我在 html 中找不到 Iframe,所以我不必担心。 我包含了隐式 wait time.sleep(10)
只是为了确保 web 页面已通过搜索完全加载。 为了更好地衡量,我包括了visibility_of_element_located
的显式等待。 我还尝试使用presence_of_element_located
和element_to_be_clickable
。 我快疯了,有人能帮忙吗?
罗斯
有2个问题...
首先,改变这个:
browser.switch_to.default_content()
对此:
browser.switch_to.window(browser.window_handles[-1])
切换到default_content
仅在 iFrame 中工作时使用,这里不是这种情况。 browser.switch_to.window(browser.window_handles[-1])
切换到最后打开的选项卡
其次,你的最后一行应该是:
print(last_trade_date.text)
代替:
print(last_trade_date)
印刷:
1/15/2021
顺便说一句,我不认为time.sleep(10)
是必要的,我把它完全拿出来,它运行良好
如果块,我最后对你工作过。 问题是您单击时打开了一个新选项卡
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#tradeHistory_link"))).click()
所以您需要将 web 驱动程序的焦点更改为该新选项卡:
driver.switch_to.window(new_window)
代码:
#looks to see if all values are non-empty and if moody rating and sandp rating are not equal to 'WR' and 'NR'
if symbol.strip() and maturity.strip() and moody_rating.strip() and sandp_rating.strip() and bond_yield.strip() and moody_rating != "WR" and sandp_rating != "NR":
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ms-finra-search-results > div > div.qs-resultData > div.qs-resultData-body > div.rtq-grid.rtq-grid-auto-h > div.rtq-scrollpanel > div.rtq-grid-scroll > div > div:nth-child(" + str(row_num + 1) + ") > div:nth-child(2) > div > a"))).click()
WebDriverWait(driver, 5).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "ms-bond-detail-iframe")))
windows_before = driver.current_window_handle
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#tradeHistory_link"))).click()
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0]
driver.switch_to.window(new_window)
#bond information has everything we need. Now we check to see the last time this bond was actually traded
#new_window = [x for x in window_after if x != window_before][0]
#driver.switch_to.window(new_window)
sleep(5)
last_trade_date = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#ms-glossary > div > table > tbody > tr:nth-child(1) > td:nth-child(1) > div")))
print(last_trade_date.text)
输出/输出:
1/15/2021
Process finished with exit code 0
我还建议不要为每个操作创建 WebDriverWait object。 相反,您可以执行以下操作:
wait = WebDriverWait(driver, 30)
现在在任何地方使用等待,如下所示:
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#ms-glossary > div > table > tbody > tr:nth-child(1) > td:nth-child(1) > div")))
通过这种方式,您可以优化您的代码。 您将拥有更少的空间复杂度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.