简体   繁体   中英

How to extract the texts from the span tag as per the html using selenium and Python

I'm looking to pull the following information between the span/div tags from these three tags.

<span class="engagementInfo-valueNumber js-countValue">496.26K</span>

<div class="websiteRanks-valueContainer js-websiteRanksValue">
        <span class="websiteRanks-valueChange websiteRanks-valueChange--isSingleMode websiteRanks-valueChange--up"></span>
    180
</div>

<span class="websitePage-relativeChangeNumber">16.35%</span>

When I copy the xpath it turns out like:

/html/body/div[1]/main/div/div/div[2]/div[2]/div[1]/div[3]/div/div/div/div[2]/div/span[2]/span[2]/span

and copying the selector yields:

body > div.wrapper-body.wrapperBody--websiteAnalysis.js-wrapperBody > main > div > div > div.analysisPage-section.analysisPage-section--withFeedback.websitePage-overview.js-section.js-showInCompare.is-active.js-triggered > div.analysisPage-sectionContent.analysisPage-sectionVisits.js-sectionContent.js-print-pageFooter.is-triggered > div.u-clearfix.analysisPage-sectionOverview > div.websitePage-mobileFramed.websitePage-mobileFramed--overview > div > div > div > div:nth-child(2) > div > span.engagementInfo-value.engagementInfo-value--large.u-text-ellipsis > span.engagementInfo-valueRelative.websitePage-relativeChange.websitePage-relativeChange--delay.websitePage-relativeChange--up.js-showOnCount.is-shown > span

ultimately I would love a few elements with 496.26K , 180 and 16.35% , or in a list.

I've tried the following without success, though its worked for me for other websites in the past:

url = 'https://www.similarweb.com/website/' + domain
        driver.get(url) #get response
        driver.implicitly_wait(2) #wait to load content
        total_vists = driver.find_element_by_xpath(xpath='/html/body/div[1]/main/div/div/section[2]/div/ul/li[1]/div[2]').text

You can try css selector for first span as :

for extracting 496.26K

first_span = driver.find_element_by_css_selector("span.engagementInfo-valueNumber.js-countValue").text  

print(first_span)  

for extracting 180 :

second_span= driver.find_element_by_css_selector("span.websiteRanks-valueChange.websiteRanks-valueChange--isSingleMode.websiteRanks-valueChange--up")  

print(second_span.text)  

for extracting 16.35%

third_span= driver.find_element_by_css_selector("span.websitePage-relativeChangeNumber")  

print(third_span.text) 

As perr the HTML you have shared as the elements are JavaScript based so you need to induce WebDriverWait for the elements to be visible and you can use the following solutions:

  • 496.26K :

     engagementInfo = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='engagementInfo-valueNumber js-countValue']"))).get_attribute("innerHTML")
  • 180 :

     websiteRanks = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='websiteRanks-valueContainer js-websiteRanksValue']"))) websiteRanksText = driver.execute_script('return arguments[0].lastChild.textContent;', websiteRanks).strip()
  • 16.35% :

     websitePageChangeNumber = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='websitePage-relativeChangeNumber']"))).get_attribute("innerHTML")

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM