简体   繁体   English

当我尝试使用 selenium xpath 抓取 Web 表格文本值时,它给出了空文本值

[英]When i try to scrape web table text values using selenium xpath it gives empty text values

I'm trying to get text values from the below URL Annual & quarterly tables.我正在尝试从以下 URL 年度和季度表中获取文本值。 but it gives empty value.但它给出了空值。 what might be the problem here.这里可能有什么问题。 can anyone give me some help?谁能给我一些帮助? These are the values I'm trying to extract,这些是我试图提取的值,

在此处输入图片说明

This is the code : in this, I'm trying to get 2018 text.这是代码:在此,我正在尝试获取 2018 文本。 i need all of the text inside the box.我需要框中的所有文本。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait  # for implicit and explict waits
from selenium.webdriver.chrome.options import Options  # for suppressing the browser
import os

url = f'https://www.morningstar.ca/ca/report/stocks/financials.aspx?t=0P000000GY&lang=en-CA'

codePath = os.getcwd() + r"\chromedriver_win32\chromedriver"
PATH = os.path.join(codePath)
service = Service(PATH)

service.start()
option = webdriver.ChromeOptions()
# option.add_argument('headless')

driver = webdriver.Remote(service.service_url, options=option)

driver.get(url);
time.sleep(2) 

Y1 = driver.find_elements_by_xpath('/html/body/div[3]/div[1]/div/div[1]/div[1]/div[4]/sal-components/section/div/div/div[2]/div/div[2]/div/div/div[2]/div[2]/div/div[2]/div/div[2]/div[2]/div/div[2]/div[2]/div/div/table[2]/tbody/tr[1]/th[2]/span')[0].text
print(Y1)

driver.close()

I am not very sure what you're trying to do but to get table text value just need to do the following:我不太确定您要做什么,但要获取表格文本值,只需执行以下操作:

Assuming that your code snippet works fine and the problem is just with Y1, you can do the following:假设您的代码片段工作正常并且问题仅在于 Y1,您可以执行以下操作:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait  # for implicit and explict waits
from selenium.webdriver.chrome.options import Options  # for suppressing the browser
import os

url = f'https://www.morningstar.ca/ca/report/stocks/financials.aspx?t=0P000000GY&lang=en-CA'

codePath = os.getcwd() + r"\chromedriver_win32\chromedriver"
PATH = os.path.join(codePath)
service = Service(PATH)

service.start()
option = webdriver.ChromeOptions()
# option.add_argument('headless')

driver = webdriver.Remote(service.service_url, options=option)

driver.get(url);
time.sleep(2) 

Y1 = WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='report-table ng-isolate-scope']")))

print(Y1.text)

driver.close()

Is it necessary to use WebDriverWait?是否有必要使用 WebDriverWait? The answer is no, i did it with WebDriverWait because you imported it and it was not being used.答案是否定的,我是用 WebDriverWait 做的,因为你导入了它并且没有被使用。

The above code prints table text value:上面的代码打印表格文本值:

Fiscal 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 TTM 5-Yr Index
103.40
Return on Assets % 27.07 28.54 19.34 18.01 20.45 14.93 13.87 16.07 15.69 17.33 23.21 16.19 7.11
Return on Equity % 41.67 42.84 30.64 33.61 46.25 36.90 36.87 49.36 55.92 73.69 103.40 51.97 22.62
Return on Invested Capital % 41.04 42.01 26.08 26.20 31.32 21.95 19.86 24.41 25.75 30.11 40.07 24.98 12.64

You are using a wrong locator.您使用了错误的定位器。
Also you have to use expected conditions to wait until the element is loaded.此外,您必须使用预期条件等待元素加载。 I see it sometimes takes a long time here.我看到这里有时需要很长时间。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait  # for implicit and explict waits
from selenium.webdriver.chrome.options import Options  # for suppressing the browser
import os

url = f'https://www.morningstar.ca/ca/report/stocks/financials.aspx?t=0P000000GY&lang=en-CA'

codePath = os.getcwd() + r"\chromedriver_win32\chromedriver"
PATH = os.path.join(codePath)
service = Service(PATH)

service.start()
option = webdriver.ChromeOptions()
# option.add_argument('headless')

driver = webdriver.Remote(service.service_url, options=option)

driver.get(url);

Y1 = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "(//div[@class='sal-components-scrollable ng-isolate-scope'])[1]")))

print(Y1.text)

driver.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM