简体   繁体   中英

context for using `yield` keyword in python

I have the following program to scrap data from a website. I want to improve the below code by using a generator with a yield instead of calling generate_url and call_me multiple times sequentially. The purpose of this exersise is to properly understand yield and the context in which it can be used.

import requests                                                                                                                                                                                              
import shutil

start_date='03-03-1997'
end_date='10-04-2015'
yf_base_url ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
index_list = ['BSESN','NSEI']

def generate_url(index, start_date, end_date):
    s_day = start_date.split('-')[0] 
    s_month = start_date.split('-')[1]
    s_year = start_date.split('-')[2]

    e_day = end_date.split('-')[0] 
    e_month = end_date.split('-')[1]
    e_year = end_date.split('-')[2]
    if (index == 'BSESN') or (index == 'NSEI'):
        url = yf_base_url + index + '&a={}&b={}&c={}&d={}&e={}&f={}'.format(s_day,s_month,s_year,e_day,e_month,e_year)
        return url 

def callme(url,index):
    print('URL {}'.format(url))
    r = requests.get(url, verify=False,stream=True)
    if r.status_code!=200:
        print "Failure!!"
        exit()
    else:
        r.raw.decode_content = True
        with open(index + "file.csv", 'wb') as f:
            shutil.copyfileobj(r.raw, f)
        print "Success"

if __name__ == '__main__':
    url = generate_url(index_list[0],start_date,end_date)
    callme(url,index_list[0])
    url = generate_url(index_list[1],start_date,end_date)
    callme(url,index_list[1])

There are multiple options. You could use yield to iterate over URL's. Or over request objects.

If your index_list were long, I would suggest yielding URLs. Because then you could use multiprocessing.Pool to map a function that does a request and saves the output over these URLs. That would execute them in parallel, potentially making it a lot faster (assuming that you have enough network bandwidth, and that yahoo finance doesn't throttle connections).

yf ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
    '{}&a={}&b={}&c={}&d={}&e={}&f={}'
index_list = ['BSESN','NSEI'] 

def genurl(symbols, start_date, end_date):
    # assemble the URLs
    s_day, s_month, s_year = start_date.split('-')
    e_day, e_month, e_year = end_date.split('-')
    for s in symbols:
        url = yf.format(s, s_day,s_month,s_year,e_day,e_month,e_year)
        yield url

def download(url):
    # Do the request, save the file

p = multiprocessing.Pool()
rv = p.map(download, genurl(index_list, '03-03-1997', '10-04-2015'))

If I understand you correctly, what you want to know is how to change the code so that you can replace the last part by

if __name__ == '__main__':
   for url in generate_url(index_list,start_date,end_date):
       callme(url,index)

If this is correct, you need to change generate_url , but not callme . Changing generate_url is rather mechanical. Make the first parameter index_list instead of index , wrap the function body in a for index in index_list loop, and change return url to yield url .

You don't need to change callme because you never want to say something like for call in callme(...) . You won't do anything with it but a normal function call.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM