简体   繁体   中英

How to use multithreading / multiprocessing in place of For loop with pandas dataframe

Currently, I am in a project in which performing validation based on the data provided in the row dataframe, so my current approach is a sequential approach to perform validation.

for index in mt.index():
    #File Reading 
    #performing validation

But I want to implement multithreading/Multiprocessing to enhance my processing time in the current approach it will take more time than expected. Can anyone suggest or help me to how to implement multithreading/multiprocessing which enhances my Script performance.

You can use the Pool API:

from multiprocessing import Pool

p = Pool()

def validate(index):
   ## do validation work for a given index here

result = p.map(validate, mt.index())

The map function will parallelize the loop over the values of mt.index() . Check out these docs for more options.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM