I am trying to scrape data from similarweb The data is in the form of a chart and I am trying to scrape the month and the value associated with it. here's the code:
websites = ['https://www.similarweb.com/website/zalando.de/#overview', 'https://www.similarweb.com/website/asos.com/#overview',
'https://www.similarweb.com/website/aboutyou.de/#overview', 'https://www.similarweb.com/website/boohoo.com/#overview',
'https://www.similarweb.com/website/deliveryhero.com/#overview', 'https://www.similarweb.com/website/justeattakeaway.com/#overview',
'https://www.similarweb.com/website/hellofresh.com/#overview', 'https://www.similarweb.com/website/blueapron.com/#overview',
'https://www.similarweb.com/website/shop.adidas.co.in/#overview', 'https://www.similarweb.com/website/nike.com/#overview',
'https://www.similarweb.com/website/in.puma.com/#overview', 'https://www.similarweb.com/website/hugoboss.com/#overview']
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
delays = [7, 4, 6, 2, 10, 19]
delay = np.random.choice(delays)
for crawler in websites:
browser.get(crawler)
time.sleep(2)
time.sleep(delay)
website_names = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/section[1]/div[1]/div/div[1]/a').get_attribute("href")
total_visits = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[2]/div/span[2]/span[1]').text
avg_visit_duration = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[3]/div/span[2]/span').text
pages_per_visit = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[4]/div/span[2]/span').text
bounce_rate = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[5]/div/span[2]/span').text
months = browser.find_elements(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*")
for date in months:
print(date.text)
tooltip = browser.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][8]/*[local-name()='text']")
ActionChains(browser).move_to_element(tooltip).perform()
month_value = browser.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']")
print(month_value.text)
# printing all scraped data
print('Website Names:', website_names)
print('Total visits:', total_visits)
print('Average visit duration:', avg_visit_duration)
print('Pages per visit:', pages_per_visit)
print('Bounce rate:', bounce_rate)
Inspite of giving the correct Xpaths I am facing an error like: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[4]/div[1]/div[2]/div[2]/div[1]/div/svg/g[8]/text/tspan[1]"} (Session info: chrome=90.0.4430.93)
When I casually open the website The graph displays months as November 2020, December 2020, January 2021,...March 2021. But upon inspecting, it displays months as 7th Nov, 23rd Nov, 15th Dec, 30th Dec,...
is it because of this it is giving me the NoSuchElementException? Please help!
EDIT : tried using this method, got empty lists []
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "highcharts-series")))
test = browser.find_elements_by_xpath("//*[name()='svg']//*[name()='g' and @class='highcharts-series']/*[name()='path']")
res = []
for el in test:
hover = ActionChains(browser).move_to_element(el)
hover.perform()
date = browser.find_elements_by_xpath('//*[@id="highcharts-0"]/svg/g[8]/text/tspan[1]').text
price = browser.find_elements_by_xpath('//*[@id="highcharts-0"]/svg/g[8]/text/tspan[3]').text
print('dd',date)
print('pr', price)
The elements are inside svg tag, you would have to change your locater here.
//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*
This represent months with their value. you can store everything in a list and can print them.
Something like this:
months = driver.find_elements(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*")
for date in months:
print(date.text)
Yes when you hover over to a particular month you can use the below xpath
//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']
month_value = driver.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']")
print(month_value.text)
to print the respective value.
for hovering to a particular month you can use the below code:
tooltip = driver.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][8]/*[local-name()='text']")
ActionChains(driver).move_to_element(tooltip).perform()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.