简体   繁体   中英

How to extract the text 121.6 from the text node within the span tag using Selenium and Python

For the following element on the web

<span title="点赞数14332" class="like"><!----><!----><!----><!----><!----><i class="van-icon-videodetails_like" style="color:;"></i>1.4万
    </span>
<span title="number" class="like">
   <!---->
   <!---->
   <!---->
   <!---->
   <!---->
   <i class="van-icon-videodetails_like" style="color:;"></i>
"121.6"
</span>

What I want to get is the number "121.6". I tried,

likes = driver.find_elements_by_xpath('//span[@class="like"]')[0].text

It returns me "--", and nothing else.

I also tried to copy the X-path from the inspection,

likes = driver.find_elements_by_xpath('//*[@id="arc_toolbar_report"]/div[1]/span[1]/text()')

But selenium returns me:

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: The result of the xpath expression "//*[@id="arc_toolbar_report"]/div[1]/span[1]/text()" is: [object Text]. It should be an element.

What should I do to get the number "121.6"?

To extract the number of Likes on the video ie the current text as 1.5万as the text is within a Text Node you need to induce WebDriverWait for the visibility_of_element_located() and execute_script() method and you can use either of the following Locator Strategies :

  • Using XPATH :

     print(driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='like' and starts-with(@title, '点赞数')][not(contains(.,'--'))]")))).strip())
  • Using CSS_SELECTOR :

     print(driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.like[title^='点赞数']")))).strip())
  • Note : You have to add the following imports:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

Reference

You can find a couple of relevant discussions in:

I've tested your provided xpath and your link to bilibili.com and was able to get the number of likes for that video. As already mentioned in DebanjanB's answer you will have to use WebDriverWait .

Your code returned '--' because the amount of likes were still dynamically being loaded in the background. Below you will find a code snippet that waits for the '--' to be replaced by an actual number

WebDriverWait(driver, 120).until(
    EC.visibility_of_element_located((By.XPATH,'/html/body/div[3]/div/div[1]/div[3]/div[1]/span[1] \
    [not(contains(text(),"--"))]')
))

likes = driver.find_element_by_xpath("/html/body/div[3]/div/div[1]/div[3]/div[1]/span[1]").text
print(likes)

which returns 1.4万

The

[not(contains(text(),"--"))]

basically tells the driver to wait until the selected node does not contain the '--' string.

EDIT:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

These are the imports which you will need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM