简体   繁体   English

多处理比循环慢

[英]multiprocessing slower than loop

I'm trying to write huge data to a csv file.我正在尝试将大量数据写入 csv 文件。 When I try normal method it writes 50 data in 1 second but with multiprocessing it's down to 5 data in 1 second.当我尝试正常方法时,它会在 1 秒内写入 50 个数据,但使用多处理时,它会在 1 秒内减少到 5 个数据。 And I also added this code sys.setrecursionlimit(25000) .而且我还添加了这个代码sys.setrecursionlimit(25000) Because without it's giving error.因为没有它就会出错。

I can feel I'm not doing right.我能感觉到我做得不对。 What is the right way?什么是正确的方法?

from bs4 import BeautifulSoup
import requests
import lxml
import csv
import cchardet
from multiprocessing import Pool
import sys
import time

sys.setrecursionlimit(25000)

csvfileWrite=open("comments.csv", 'a+', newline='',encoding='utf-8') #declared as a global variable
writer = csv.writer(csvfileWrite, delimiter=';', quotechar='"', 
quoting=csv.QUOTE_MINIMAL) #declared as a global variable


def kacYildiz(div): #This function returns a number 0 to 5. Not important.
    yildizSayisi=0
    yildizYeri=div.find("div",attrs={"class":"RatingPointer-module-1OKF3"})
    yildizlar=yildizYeri.find_all("svg")
    
    for yildiz in yildizlar:
        sonuc=yildiz.find("path").get("fill")
        if(sonuc=="#f28b00"):
            yildizSayisi+=1

    return yildizSayisi

def takeText(div):
    comment=div.find("span",attrs={"itemprop":"description"}).text
    return comment


def yorumSayfaSayisi(row): # This function returns a number that how many 
    pages in the sites comment section. Not important.
    yorumKismi="-yorumlari?"
    adres=row[0]+yorumKismi

    r = requests_session.get(adres)
    
    soup = BeautifulSoup(r.text,"lxml")
    
    sayfaS=soup.find("ul",attrs={"class":"PaginationBar-module-3qhrm"})
        
    sayi=sayfaS.find_all("li")[-1].text
    return sayi



def writeToCsv(comments): #writing commets to csv file.
    global csvfileWrite
    global writer
   
    textToWrite = takeText(comments)
                    
    writer.writerow([kacYildiz(comments),textToWrite]) 


if __name__ == '__main__':
    pageNumber=1
    requests_session = requests.Session()
    comments=list()
    
    csvfile=open('adresler.csv',newline='')
    reader = csv.reader(csvfile, delimiter=';', quotechar='|')

          
    for row in reader:
        rowNumber=yorumSayfaSayisi(row)
            
        for i in range(1,int(rowNumber)):
            comments.clear()
            commetAdress="-yorumlari?sayfa={}".format(i)               
            adress=row[0]+commetAdress                  
            r = requests_session.get(adress)                
            soup = BeautifulSoup(r.text,"lxml") 
            page=soup.find_all("div",attrs={"class":"ReviewCard-module- 
            3Y36S"})  

            for comment in page:
                comments.append(comment)

            p = Pool(10)
            start = time.process_time()   
            p.map(writeToCsv, comments) 
                
            p.terminate()
            p.join()

once try this approach using ThreadPool一旦使用 ThreadPool 尝试这种方法

from multiprocessing.pool import ThreadPool

def csvYaz(yorumlar):
    global csvfileYaz
    global yazici
    yazi = yorumAl(yorumlar)
    yazici.writerow([kacYildiz(yorumlar),yazi])

# ------main-----
for yorum in yorumSayfasi:
    yorumlar.append(yorum)

threads = ThreadPool(10).map(csvYaz, yorumlar)
for zz in threads:
    print(zz)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM