简体   繁体   中英

Multiprocessing in python, multiple process running same instructions

I'm using multiprocessing in Python for parallelizing. I'm trying to parallelize the process on chunks of data read from an excel file using pandas.

I'm new to multiprocessing and parallel processing. During implementation on simple code,

import time;
import os;
from multiprocessing import Process
import pandas as pd
print os.getpid();
df = pd.read_csv('train.csv', sep=',',usecols=["POLYLINE"],iterator=True,chunksize=2);
print "hello";
def my_function(chunk):
    print chunk;
count = 0;
processes = [];
for chunk in df:
    if __name__ == '__main__':
        p = Process(target=my_function,args=(chunk,));
        processes.append(p);
    if(count==4):
        break;
    count = count + 1;

The print "hello" is being executed multiple times, I'm guessing the individual process created should work on the target rather than main code.

Can anyone suggest me where I'm wrong.

在此输入图像描述

The way that multiprocessing works is create a new process and then import the file with the target function. Since your outermost scope has print statements, it will get executed once for every process.

By the way you should use a Pool instead of Process es directly. Here's a cleaned up example:

import os
import time
from multiprocessing import Pool

import pandas as pd

NUM_PROCESSES = 4


def process_chunk(chunk):
    # do something
    return chunk


if __name__ == '__main__':
    df = pd.read_csv('train.csv', sep=',', usecols=["POLYLINE"], iterator=True, chunksize=2)
    pool = Pool(NUM_PROCESSES)

    for result in pool.map(process_chunk, df):
        print result

Using multiprocessing is probably not going to speed up reading data from disk, since disk access is much slower than eg RAM access or calculations. And the different pieces of the file will end up in different processes.

Using mmap could help speed up data access.

If you do a read-only mmap of the data file before starting eg a Pool.map , each worker could read its own slice of data from the shared memory mapped file and process it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM