简体   繁体   中英

How do I parallelize a for loop that calls a function?

I have the following snippet which iterates over a list of .csv files and then uses a insert_csv_data function which reads, preprocesses and inserts the .csv file's data into a .hyper file ( Hyper is Tableau's new in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets ):

A detailed interpretation of the insert_csv_data function can be found here

for csv in csv_list:
            insert_csv_data(hyper)

The issue with the above code is that it inserts one .csv file into the .hyper file at a time, which is pretty slow at the moment.

I would like to know if there's a faster or parallel workaround as I'm using Apache Spark for processing on Databricks. I've done some research and found modules like multiprocessing , joblib and asyncio that might work for my scenario, but I'm unsure of how to correctly implement them.

Please Advise

Edit:

Parallel Code:

from joblib import Parallel, delayed
element_run = Parallel(n_jobs=1)(delayed(insert_csv_data)(csv) for csv in csv_list)

This does not directly answer the question but demonstrates how multiprocessing and multithreading are easily interchangeable using the concurrent.futures module. Note that the two loops achieve exactly the same thing and that the only difference between the two sections of code the is the work manager class.

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def tfunc(n):
    return n * n


N = 1_000


def main():
    with ThreadPoolExecutor() as executor:
        for future in [executor.submit(tfunc, n) for n in range(N)]:
            future.result()

    with ProcessPoolExecutor() as executor:
        for future in [executor.submit(tfunc, n) for n in range(N)]:
            future.result()


if __name__ == '__main__':  
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM