Python Too slow using multiprocess when reading a file

Question

When I read a file without multiprocess, It takes 0.16 seconds but when I use multiprocess, It takes 0.36. Why using multiprocess takes a longer time than a single thread?

the code below is that I want to read a file and split the file into 10 and compare the lines

code without multiprocess

import time

result = []


def get_match(lines, num):
    outer_lines = lines[:num]
    inner_lines = lines[1:]
    for f1 in outer_lines:
        # print('f1', f1)
        for f2 in inner_lines:
            result.append(f1)
            result.append(f2)
            # print('f2', f2)
            # print('compare file line by line')
            # print('store int into global result variable')


if __name__ == '__main__':
    atime = time.time()

    split_n = 10

    with open('10000.txt', 'r') as file:

        line1 = file.readlines()
        line2 = line1[split_n:]
        line3 = line2[split_n:]
        line4 = line3[split_n:]
        line5 = line4[split_n:]
        line6 = line5[split_n:]
        line7 = line6[split_n:]
        line8 = line7[split_n:]
        line9 = line8[split_n:]
        line10 = line9[split_n:]

        t1 = get_match(line1,split_n,)
        t2 = get_match(line2,split_n,)
        t3 = get_match(line3,split_n,)
        t4 = get_match(line4,split_n,)
        t5 = get_match(line5,split_n,)
        t6 = get_match(line6,split_n,)
        t7 = get_match(line7,split_n,)
        t8 = get_match(line8,split_n,)
        t9 = get_match(line9,split_n,)
        t10 = get_match(line10,split_n,)

    btime = time.time()
    print(btime-atime)

code with multiprocess

from multiprocessing import Process
import time


result = []


def get_match(lines, num):
    outer_lines = lines[:num]
    inner_lines = lines[1:]
    for f1 in outer_lines:
        for f2 in inner_lines:
            result.append(f1)
            result.append(f2)


if __name__ == '__main__':
    atime = time.time()

    split_n = 10

    with open('test.txt', 'r') as file:
        line1 = file.readlines()
        line2 = line1[split_n:]
        line3 = line2[split_n:]
        line4 = line3[split_n:]
        line5 = line4[split_n:]
        line6 = line5[split_n:]
        line7 = line6[split_n:]
        line8 = line7[split_n:]
        line9 = line8[split_n:]
        line10 = line9[split_n:]

    p1 = Process(target=get_match, args=(line1, split_n, ))
    p1.start()
    p2 = Process(target=get_match, args=(line2, split_n,))
    p2.start()
    p3 = Process(target=get_match, args=(line3, split_n,))
    p3.start()
    p4 = Process(target=get_match, args=(line4, split_n,))
    p4.start()
    p5 = Process(target=get_match, args=(line5, split_n,))
    p5.start()
    p6 = Process(target=get_match, args=(line6, split_n,))
    p6.start()
    p7 = Process(target=get_match, args=(line7, split_n,))
    p7.start()
    p8 = Process(target=get_match, args=(line8, split_n,))
    p8.start()
    p9 = Process(target=get_match, args=(line9, split_n,))
    p9.start()
    p10 = Process(target=get_match, args=(line10, split_n,))
    p10.start()

    procs = [p1,p2,p3,p4,p5,p6,p7,p8,p9,p10]

    # complete the processes
    for proc in procs:
        proc.join()

    btime = time.time()
    print(btime-atime)

Answer 1

Working with processes doesn't mean you're multiprocessing. The way your code is written, you are just opening a process, waiting for it to return, then moving on to open the next process. In your second example, you're doing the same thing as in your first, but with the added overhead of opening and closing a new process every time.

If you want actual multiprocessing (ie everything done simultaneously) I'd recommend using map . see the documentation here: https://docs.python.org/2/library/multiprocessing.html

Answer 2

Create and destroy process cost many resources like cpu, memory and time, it is bad to use multiprocess if your data is not so big. I'd recommend use some science computer packages, such as numpy and scipy.

Python Too slow using multiprocess when reading a file

Question

2 answers

solution1
1 ACCPTED 2020-01-14 14:03:52

solution2
0 2020-01-14 14:20:32

Python Too slow using multiprocess when reading a file

Question

2 answers

solution1 1 ACCPTED 2020-01-14 14:03:52

solution2 0 2020-01-14 14:20:32

solution1
1 ACCPTED 2020-01-14 14:03:52

solution2
0 2020-01-14 14:20:32