簡體   English   中英

多線程/優化Python請求?

[英]MultiThreading/Optimization Python Requests?

我正在嘗試優化此代碼,到目前為止,它在10分鍾內運行340個請求。 我試圖在30分鍾內收到1800個請求。 由於我可以每秒運行一個請求,根據Amazon API。 我可以在此代碼中使用多線程來增加運行次數嗎?

但是,我正在讀取完整數據到主函數,現在應該拆分它,如何確定每個線程應該占用多少個?

def newhmac():
    return hmac.new(AWS_SECRET_ACCESS_KEY, digestmod=sha256)

def getSignedUrl(params):
    hmac = newhmac()
    action = 'GET'
    server = "webservices.amazon.com"
    path = "/onca/xml"

    params['Version'] = '2013-08-01'
    params['AWSAccessKeyId'] = AWS_ACCESS_KEY_ID
    params['Service'] = 'AWSECommerceService'
    params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())

    key_values = [(urllib.quote(k), urllib.quote(v)) for k,v in params.items()]
    key_values.sort()
    paramstring = '&'.join(['%s=%s' % (k, v) for k, v in key_values])
    urlstring = "http://" + server + path + "?" + \
        ('&'.join(['%s=%s' % (k, v) for k, v in key_values]))
    hmac.update(action + "\n" + server + "\n" + path + "\n" + paramstring)
    urlstring = urlstring + "&Signature="+\
        urllib.quote(base64.encodestring(hmac.digest()).strip())
    return urlstring

def readData():
    data = []
    with open("ASIN.csv") as f:
        reader = csv.reader(f)
        for row in reader:
            data.append(row[0])
    return data

def writeData(data):
    with open("data.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerows(data)

def main():
    data = readData()
    filtData = []
    i = 0
    count = 0
    while(i < len(data) -10 ):
        if (count %4 == 0):
            time.sleep(1)
        asins = ','.join([data[x] for x in range(i,i+10)])
        params = {'ResponseGroup':'OfferFull,Offers',
                 'AssociateTag':'4chin-20',
                 'Operation':'ItemLookup',
                 'IdType':'ASIN',
                 'ItemId':asins}
        url = getSignedUrl(params)
        resp = requests.get(url)
        responseSoup=BeautifulSoup(resp.text)

        quantity = ['' if product.amount is None else product.amount.text for product in responseSoup.findAll("offersummary")]
        price = ['' if product.lowestnewprice is None else product.lowestnewprice.formattedprice.text for product in responseSoup.findAll("offersummary")]
        prime = ['' if product.iseligibleforprime is None else product.iseligibleforprime.text for product in responseSoup("offer")]


        for zz in zip(asins.split(","), price,quantity,prime):
            print zz
            filtData.append(zz)

        print i, len(filtData)
        i+=10
        count +=1
    writeData(filtData)


threading.Timer(1.0, main).start()

如果您使用的是python 3.2,則可以使用concurrent.futures庫來輕松地在多個線程中啟動任務。 例如,在這里,我模擬並行運行10個URL解析作業,每個解析都需要1秒,如果同步運行,則需要10秒,而線程池為10則需要大約1秒

import time
from concurrent.futures import ThreadPoolExecutor

def parse_url(url):
    time.sleep(1)
    print(url)
    return "done."

st = time.time()
with ThreadPoolExecutor(max_workers=10) as executor:
    for i in range(10):
        future = executor.submit(parse_url, "http://google.com/%s"%i)

print("total time: %s"%(time.time() - st))

輸出:

http://google.com/0
http://google.com/1
http://google.com/2
http://google.com/3
http://google.com/4
http://google.com/5
http://google.com/6
http://google.com/7
http://google.com/8
http://google.com/9
total time: 1.0066466331481934

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM