需要將抓取的數據寫入csv文件（線程）

Question

這是我的代碼：

from download1 import download
import threading,lxml.html
def getInfo(initial,ending):
    for Number in range(initial,ending):
        Fields = ['country', 'area', 'population', 'iso', 'capital', 'continent', 'tld', 'currency_code',
                  'currency_name', 'phone',
                  'postal_code_format', 'postal_code_regex', 'languages', 'neighbours']
        url = 'http://example.webscraping.com/places/default/view/%d'%Number
        html=download(url)
        tree = lxml.html.fromstring(html)
        results=[]
        for field in Fields:
            x=tree.cssselect('table > tr#places_%s__row >td.w2p_fw' % field)[0].text_content()
            results.append(x)#should i start writing here?
downloadthreads=[]
for i in range(1,252,63): #create 4 threads
    downloadThread=threading.Thread(target=getInfo,args=(i,i+62))
    downloadthreads.append(downloadThread)
    downloadThread.start()

for threadobj in downloadthreads:
    threadobj.join() #end of each thread

print "Done"

因此results將具有Fields的值，我需要將Fields作為第一行寫入數據（僅一次），然后將results的值寫入CSV文件。 我不確定我是否可以在函數中打開文件，因為線程會同時多次打開文件。

注意：我知道抓取時不希望使用線程，但我只是在測試

Answer 1

我認為您應該考慮使用某種排隊或線程池。 如果要創建多個線程（不是4個，我想您一次使用4個以上的線程，但是一次要使用4個線程），則線程池非常有用。

可以在此處找到Queue技術的示例。

當然，您可以使用其線程ID標記文件，例如：“ results_1.txt”，“ results_2.txt”等等。 然后，您可以在所有線程完成后合並它們。

您可以使用“鎖”，“監視器”等基本概念，但是我不是它們的忠實擁護者。 鎖定的例子可以在這里找到

需要將抓取的數據寫入csv文件（線程）

問題描述

1 個解決方案

解決方案1
0 2019-02-26 17:37:56

需要將抓取的數據寫入csv文件（線程）

問題描述

1 個解決方案

解決方案1 0 2019-02-26 17:37:56

解決方案1
0 2019-02-26 17:37:56