多線程/優化Python請求？

Question

我正在嘗試優化此代碼，到目前為止，它在10分鍾內運行340個請求。 我試圖在30分鍾內收到1800個請求。 由於我可以每秒運行一個請求，根據Amazon API。 我可以在此代碼中使用多線程來增加運行次數嗎？

但是，我正在讀取完整數據到主函數，現在應該拆分它，如何確定每個線程應該占用多少個？

def newhmac():
    return hmac.new(AWS_SECRET_ACCESS_KEY, digestmod=sha256)

def getSignedUrl(params):
    hmac = newhmac()
    action = 'GET'
    server = "webservices.amazon.com"
    path = "/onca/xml"

    params['Version'] = '2013-08-01'
    params['AWSAccessKeyId'] = AWS_ACCESS_KEY_ID
    params['Service'] = 'AWSECommerceService'
    params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())

    key_values = [(urllib.quote(k), urllib.quote(v)) for k,v in params.items()]
    key_values.sort()
    paramstring = '&'.join(['%s=%s' % (k, v) for k, v in key_values])
    urlstring = "http://" + server + path + "?" + \
        ('&'.join(['%s=%s' % (k, v) for k, v in key_values]))
    hmac.update(action + "\n" + server + "\n" + path + "\n" + paramstring)
    urlstring = urlstring + "&Signature="+\
        urllib.quote(base64.encodestring(hmac.digest()).strip())
    return urlstring

def readData():
    data = []
    with open("ASIN.csv") as f:
        reader = csv.reader(f)
        for row in reader:
            data.append(row[0])
    return data

def writeData(data):
    with open("data.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerows(data)

def main():
    data = readData()
    filtData = []
    i = 0
    count = 0
    while(i < len(data) -10 ):
        if (count %4 == 0):
            time.sleep(1)
        asins = ','.join([data[x] for x in range(i,i+10)])
        params = {'ResponseGroup':'OfferFull,Offers',
                 'AssociateTag':'4chin-20',
                 'Operation':'ItemLookup',
                 'IdType':'ASIN',
                 'ItemId':asins}
        url = getSignedUrl(params)
        resp = requests.get(url)
        responseSoup=BeautifulSoup(resp.text)

        quantity = ['' if product.amount is None else product.amount.text for product in responseSoup.findAll("offersummary")]
        price = ['' if product.lowestnewprice is None else product.lowestnewprice.formattedprice.text for product in responseSoup.findAll("offersummary")]
        prime = ['' if product.iseligibleforprime is None else product.iseligibleforprime.text for product in responseSoup("offer")]


        for zz in zip(asins.split(","), price,quantity,prime):
            print zz
            filtData.append(zz)

        print i, len(filtData)
        i+=10
        count +=1
    writeData(filtData)


threading.Timer(1.0, main).start()

Answer 1

如果您使用的是python 3.2，則可以使用concurrent.futures庫來輕松地在多個線程中啟動任務。 例如，在這里，我模擬並行運行10個URL解析作業，每個解析都需要1秒，如果同步運行，則需要10秒，而線程池為10則需要大約1秒

import time
from concurrent.futures import ThreadPoolExecutor

def parse_url(url):
    time.sleep(1)
    print(url)
    return "done."

st = time.time()
with ThreadPoolExecutor(max_workers=10) as executor:
    for i in range(10):
        future = executor.submit(parse_url, "http://google.com/%s"%i)

print("total time: %s"%(time.time() - st))

輸出：

http://google.com/0
http://google.com/1
http://google.com/2
http://google.com/3
http://google.com/4
http://google.com/5
http://google.com/6
http://google.com/7
http://google.com/8
http://google.com/9
total time: 1.0066466331481934

多線程/優化Python請求？

問題描述

1 個解決方案

解決方案1
2 2016-01-18 05:20:22

多線程/優化Python請求？

問題描述

1 個解決方案

解決方案1 2 2016-01-18 05:20:22

解決方案1
2 2016-01-18 05:20:22