简体   繁体   中英

scraping the Nasdaq website

I am running the following script to scrape the Nasdaq website for a list of companies in a specific time set. The script is supposed to download the file in the DownLoad folder, rename it using the company name and transfer it to the destination folder. Finally, it should delete the file originally downloaded and continue with its loop.

Everything seems to work fine - the first file is downloaded, renamed and moved to the destination file, however when proceeding for the second download it returns this error:

FileNotFoundError: File b'C:\\Users\\Filippo Sebastio\\Downloads\\HistoricalQuotes.csv' does not exist

any idea why?

from selenium import webdriver
import os
import pandas as pd
import time
import glob

def pull_nasdaq_data(tickers, save_path):


driver = webdriver.Chrome(executable_path=r'C:\Users\Filippo Sebastio\Desktop\chromedriver.exe')

for ticker in tickers:
    site = 'http://www.nasdaq.com/symbol/' + ticker + '/historical'
    driver.get(site)
    # Choose 10 year data from a drop down
    data_range = driver.find_element_by_name('ddlTimeFrame')
    for option in data_range.find_elements_by_tag_name('option'):
        if option.text == '18 months':
            option.click()
            break
    time.sleep(5)

    driver.find_element_by_id('lnkDownLoad').click()
    time.sleep(5)
    data = pd.read_csv(r'C:\Users\Filippo Sebastio\Downloads\HistoricalQuotes.csv')
    data['company'] = ticker

    file_loc = save_path + ticker + '.csv'
    data.to_csv(file_loc, index=False)

    os.chdir(r'C:\Users\Filippo Sebastio\Downloads')
    for f in glob.glob("Historical*.csv"):
        os.remove(f)

    print("Downloaded:  ", ticker)    
    time.sleep(5)  



save_path = r'C:\Users\Filippo Sebastio\Desktop\Stock'
tickers = ['mmm', 'tesla',  'pcb']

pull_nasdaq_data(tickers, save_path)

As mentioned above the tickers are a problem. When the tickers don't download you're left with the single HistoricalQuotes.csv in your download directory, when that gets deleted there is nothing to replace it and it throws the file not found error. I've added a directory for downloads which I think might help.

def pull_nasdaq_data(tickers, save_path, download_dir):


    driver = webdriver.Chrome()

    for ticker in tickers:
        site = 'http://www.nasdaq.com/symbol/' + ticker + '/historical'
        driver.get(site)
        # Choose 10 year data from a drop down
        data_range = driver.find_element_by_name('ddlTimeFrame')
        for option in data_range.find_elements_by_tag_name('option'):
            if option.text == '18 months':
                option.click()
                break
        time.sleep(5)

        driver.find_element_by_id('lnkDownLoad').click()
        time.sleep(1)
        data = pd.read_csv(download_dir + 'HistoricalQuotes.csv')
        data['company'] = ticker

        file_loc = save_path + ticker + '.csv'
        data.to_csv(file_loc, index=False)

        os.remove(download_dir + 'HistoricalQuotes.csv')

        print("Downloaded:  ", ticker)    
        time.sleep(5)  



save_path = '/Users/tetracycline/'
download_dir = '/Users/tetracycline/Downloads/'
tickers = ['mmm', 'tsla']

pull_nasdaq_data(tickers, save_path, download_dir)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM