简体   繁体   English

Python Selenium 使用 XPATH 网络抓取没有返回数据

[英]Python Selenium webscraping returns no data using XPATH

Tried to scrape data from a webpage.试图从网页中抓取数据。 After login to the site, in the developer tools able to search the xpath and find the match.登录站点后,在开发者工具中可以搜索 xpath 并找到匹配项。 But, paython code is not returning the data.但是,paython 代码没有返回数据。

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
browser.get(loginURL)

nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
    print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

The output is output 是

d:\My Personal\2022\Suresh\Learning\Python\zerodha.py:27: DeprecationWarning: executable_path has been deprecated, please pass in a Service object browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe") d:\My Personal\2022\Suresh\Learning\Python\zerodha.py:27: DeprecationWarning: executable_path has been deprecated, 请传入一个Service object browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32 /chromedriver.exe")

DevTools listening on ws://127.0.0.1:57153/devtools/browser/74a90941-a12f-4be4-b12a-01b256292a5f [15120:6400:0309/123030.129:ERROR:device_event_log_impl.cc(214)] [12:30:30.129] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. DevTools 监听 ws://127.0.0.1:57153/devtools/browser/74a90941-a12f-4be4-b12a-01b256292a5f [15120:6400:0309/123030.129:ERROR:device_event_log_impl.cc(214)] [12:30:30: ] USB: usb_device_handle_win.cc:1049 无法从节点连接读取描述符:连接到系统的设备无法正常工作。 (0x1F) [15120:6400:0309/123030.137:ERROR:device_event_log_impl.cc(214)] [12:30:30.136] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F) [15120:6400:0309/123030.137:ERROR:device_event_log_impl.cc(214)] [12:30:30.136] USB: usb_device_handle_win.cc:1049 无法从节点连接读取描述符:连接到系统的设备是不工作。 (0x1F) Len of nifty_bank_values_xpath: 0 (0x1F) nifty_bank_values_xpath 的长度:0

similarly, when tried with find_element同样,当尝试使用 find_element

nifty_bank_values_xpath = browser.find_element(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")

Getting following error:出现以下错误:

raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//span[contains(@class, 'pane-legend-item-value__main')]"} (Session info: chrome=99.0.4844.51) Stacktrace: Backtrace:引发异常_类(消息,屏幕,堆栈跟踪)selenium.common.exceptions.NoSuchElementException:消息:没有这样的元素:无法定位元素:{“方法”:“xpath”,“选择器”:“//跨度[包含(@class , 'pane-legend-item-value__main')]"} (Session info: chrome=99.0.4844.51) Stacktrace: Backtrace:

Able to find the data in the Dev Tools->Elements returning 6 matches.能够在 Dev Tools->Elements 中找到返回 6 个匹配项的数据。

image图片开发人员工具指示匹配的行

html captured from dev console html 从开发控制台捕获

<body class="app-wrapper">
   <noscript><strong>We're sorry but kite doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
   <div id="app" class="app mobile page-tvchart">
      <div class="header">
         <div class="wrapper">
            <!----> 
            <div class="header-right">
               <!----> <!----> 
               <div class="app-nav mobile"><a href="/marketwatch" class=""><span class="icon icon-bookmark"></span></a> <a href="/dashboard" class=""><span class="icon icon-compass"></span></a> <a href="/orders" class=""><span class="icon icon-book"></span></a> <a href="/holdings" class=""><span class="icon icon-briefcase"></span></a> <a href="/positions" class=""><span class="icon icon-file-text"></span></a> <a href="/funds" class="margins"><span class="icon icon-credit-card"></span></a></div>
               <div class="right-nav">
                  <div class="user-nav perspective">
                     <a href="" class="dropdown-label">
                        <div id="avatar-43">
                           <div class="avatar" style="width: 25px; height: 25px; border-radius: 50%; text-align: center; vertical-align: middle; background-color: rgba(156, 39, 176, 0.1); font-size: 9px; font-weight: 300; color: rgb(156, 39, 176); line-height: 26px;"><span>SS</span></div>
                           <!---->
                        </div>
                        <span class="user-id">ZX8487</span>
                     </a>
                     <!---->
                  </div>
               </div>
            </div>
         </div>
      </div>
      <div class="container wrapper">
         <!----> 
         <div class="container-right">
            <!----> <!----> 
            <div class="page-content tvchart">
               <!----> 
               <div>
                  <div class="chart-frame">
                     <div id="tv_chart_container" class="tv-chart-container" style="height: 547px;"><iframe id="tradingview_e1a6c" name="tradingview_e1a6c" src="/static/tv-chart/static/en-tv-chart.aaac22e21df68f2f7bad.html#symbol=NIFTY%20BANK%3AINDICES%3A260105&amp;interval=1D&amp;widgetbar=%7B%22details%22%3Afalse%2C%22watchlist%22%3Afalse%2C%22watchlist_settings%22%3A%7B%22default_symbols%22%3A%5B%5D%7D%7D&amp;timeFrames=%5B%7B%22text%22%3A%225y%22%2C%22resolution%22%3A%22W%22%7D%2C%7B%22text%22%3A%221y%22%2C%22resolution%22%3A%22W%22%7D%2C%7B%22text%22%3A%226m%22%2C%22resolution%22%3A%22120%22%7D%2C%7B%22text%22%3A%223m%22%2C%22resolution%22%3A%2260%22%7D%2C%7B%22text%22%3A%221m%22%2C%22resolution%22%3A%2230%22%7D%2C%7B%22text%22%3A%225d%22%2C%22resolution%22%3A%225%22%7D%2C%7B%22text%22%3A%221d%22%2C%22resolution%22%3A%221%22%7D%5D&amp;locale=en&amp;uid=tradingview_e1a6c&amp;clientId=tradingview.com&amp;userId=ZX8487&amp;chartsStorageUrl=%2Fapi%2Fchart%2Fpreferences&amp;chartsStorageVer=1.1&amp;customCSS=%2Fstatic%2Ftv-chart%2Fstatic%2Fcustom_style.css&amp;debug=false&amp;timezone=Asia%2FKolkata&amp;theme=Light" frameborder="0" allowtransparency="true" scrolling="no" allowfullscreen="" style="display: block; width: 100%; height: 100%;"></iframe></div>
                  </div>
                  <div class="instrument-market-data">
                     <div class="row">
                        <div class="three columns">
                           <div class="label">Open</div>
                           <div class="value">33278.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">High</div>
                           <div class="value">33890.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Low</div>
                           <div class="value">32948.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Close</div>
                           <div class="value">33158.1</div>
                        </div>
                     </div>
                     <div class="row">
                        <div class="three columns">
                           <div class="label">Volume</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Avg. trade price</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Total buy quantity</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Total sell quantity</div>
                           <div class="value">—</div>
                        </div>
                     </div>
                  </div>
                  <!---->
               </div>
            </div>
         </div>
      </div>
      <!----> <!----> 
      <div class="baskets">
         <!----> <!----> <!----> <!----> <!----> <!---->
      </div>
      <!----> 
      <div>
         <!----> <!---->
      </div>
      <!----> <!----> <!----> <!----> 
      <div class="orders-basket">
         <!---->
      </div>
      <!----> <!---->
   </div>
   <script async="">try {
      var theme = JSON.parse(localStorage.__storejs_kite_theme);
      if (theme) {
        document.documentElement.setAttribute("data-theme", theme);
      }
      } catch (_) {
      }
   </script><script type="module" src="/static/js/chunk-vendors.ea6114a1.js"></script><script type="module" src="/static/js/app.ae4bb317.js"></script><script>!function(){var e=document,t=e.createElement("script");if(!("noModule"in t)&&"onbeforeload"in t){var n=!1;e.addEventListener("beforeload",function(e){if(e.target===t)n=!0;else if(!e.target.hasAttribute("nomodule")||!n)return;e.preventDefault()},!0),t.type="module",t.src=".",e.head.appendChild(t),t.remove()}}();</script><script src="/static/js/chunk-vendors-legacy.ea6114a1.js" nomodule=""></script><script src="/static/js/app-legacy.cffeb71c.js" nomodule=""></script>
   <div class="su-toast-groups">
      <div class="su-toast-group su-toast-top-left">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-top-center">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-top-right">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-left">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-center">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-right">
         <div></div>
      </div>
   </div>
   <!---->
</body>

You are missing a wait here.你在这里错过了wait
You should wait for the elements to be completely loaded before accessing them with find_elements methods.在使用find_elements方法访问它们之前,您应该等待元素完全加载。
The best approach here is to use Expected Conditions explicit waits, as following:这里最好的方法是使用 Expected Conditions 显式等待,如下所示:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

UPD更新程序
Since that element is inside an iframe you have to switch to that iframe before accessing elements inside it, as following:由于该元素位于 iframe 内部,您必须在访问其中的元素之前切换到该 iframe ,如下所示:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='tradingview']")))

wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

When you finished working with elements inside the iframe, to switch to default content you will need to perform当您使用完 iframe 中的元素后,要切换到默认内容,您需要执行

driver.switch_to.default_content()

Thank you @Prophet for guiding me to explore iframe .感谢@Prophet 指导我探索iframe

browser.get(niftybankchartURL)
time.sleep(10)

# jump into iframe
browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))

Once the switch_to into the frame the XPATH was working fine.一旦switch_to进入框架, XPATH就可以正常工作了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM