How to use multithreading / multiprocessing in place of For loop with pandas dataframe

Question

Currently, I am in a project in which performing validation based on the data provided in the row dataframe, so my current approach is a sequential approach to perform validation.

for index in mt.index():
    #File Reading 
    #performing validation

But I want to implement multithreading/Multiprocessing to enhance my processing time in the current approach it will take more time than expected. Can anyone suggest or help me to how to implement multithreading/multiprocessing which enhances my Script performance.

Answer 1

You can use the Pool API:

from multiprocessing import Pool

p = Pool()

def validate(index):
   ## do validation work for a given index here

result = p.map(validate, mt.index())

The map function will parallelize the loop over the values of mt.index() . Check out these docs for more options.

How to use multithreading / multiprocessing in place of For loop with pandas dataframe

Question

1 answers

solution1
1 2021-02-10 05:18:26

How to use multithreading / multiprocessing in place of For loop with pandas dataframe

Question

1 answers

solution1 1 2021-02-10 05:18:26

solution1
1 2021-02-10 05:18:26