简体   繁体   中英

How to scrape data from Highcharts charts using selenium and python?

I am trying to scrape data from similarweb The data is in the form of a chart and I am trying to scrape the month and the value associated with it. here's the code:

websites = ['https://www.similarweb.com/website/zalando.de/#overview', 'https://www.similarweb.com/website/asos.com/#overview',
                'https://www.similarweb.com/website/aboutyou.de/#overview', 'https://www.similarweb.com/website/boohoo.com/#overview',
                'https://www.similarweb.com/website/deliveryhero.com/#overview', 'https://www.similarweb.com/website/justeattakeaway.com/#overview',
                'https://www.similarweb.com/website/hellofresh.com/#overview', 'https://www.similarweb.com/website/blueapron.com/#overview',
                'https://www.similarweb.com/website/shop.adidas.co.in/#overview', 'https://www.similarweb.com/website/nike.com/#overview',
                'https://www.similarweb.com/website/in.puma.com/#overview', 'https://www.similarweb.com/website/hugoboss.com/#overview']

    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)

    browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
    delays = [7, 4, 6, 2, 10, 19]
    delay = np.random.choice(delays)
    for crawler in websites:
        browser.get(crawler)
        time.sleep(2)

        time.sleep(delay)
        website_names = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/section[1]/div[1]/div/div[1]/a').get_attribute("href")
        total_visits = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[2]/div/span[2]/span[1]').text
        avg_visit_duration = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[3]/div/span[2]/span').text
        pages_per_visit = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[4]/div/span[2]/span').text
        bounce_rate = browser.find_element_by_xpath('/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[3]/div/div/div/div[5]/div/span[2]/span').text
        months = browser.find_elements(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*")
        for date in months:
            print(date.text)

        tooltip = browser.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][8]/*[local-name()='text']")
        ActionChains(browser).move_to_element(tooltip).perform()
        month_value = browser.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']")
        print(month_value.text)

        # printing all scraped data
        print('Website Names:', website_names)
        print('Total visits:', total_visits)
        print('Average visit duration:', avg_visit_duration)
        print('Pages per visit:', pages_per_visit)
        print('Bounce rate:', bounce_rate)

Inspite of giving the correct Xpaths I am facing an error like: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/main/div/div/div[2]/div[2]/div/div[4]/div[1]/div[2]/div[2]/div[1]/div/svg/g[8]/text/tspan[1]"} (Session info: chrome=90.0.4430.93)

When I casually open the website The graph displays months as November 2020, December 2020, January 2021,...March 2021. But upon inspecting, it displays months as 7th Nov, 23rd Nov, 15th Dec, 30th Dec,...

is it because of this it is giving me the NoSuchElementException? Please help!

EDIT : tried using this method, got empty lists []

element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "highcharts-series")))
        test = browser.find_elements_by_xpath("//*[name()='svg']//*[name()='g' and @class='highcharts-series']/*[name()='path']")

        res = []
        for el in test:
            hover = ActionChains(browser).move_to_element(el)
            hover.perform()
            date = browser.find_elements_by_xpath('//*[@id="highcharts-0"]/svg/g[8]/text/tspan[1]').text
            price = browser.find_elements_by_xpath('//*[@id="highcharts-0"]/svg/g[8]/text/tspan[3]').text
            print('dd',date)
            print('pr', price)

The elements are inside svg tag, you would have to change your locater here.

//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*

This represent months with their value. you can store everything in a list and can print them.

Something like this:

months = driver.find_elements(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][6]/*/*")
for date in months:
    print(date.text)

Yes when you hover over to a particular month you can use the below xpath

//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']

month_value = driver.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g' and @class='highcharts-tooltip']/*[local-name()='text']")
print(month_value.text)

to print the respective value.

for hovering to a particular month you can use the below code:

tooltip = driver.find_element(By.XPATH, "//*[local-name() = 'svg']/*[local-name()='g'][8]/*[local-name()='text']")
ActionChains(driver).move_to_element(tooltip).perform()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM