简体   繁体   中英

Python Too slow using multiprocess when reading a file

When I read a file without multiprocess, It takes 0.16 seconds but when I use multiprocess, It takes 0.36. Why using multiprocess takes a longer time than a single thread?

the code below is that I want to read a file and split the file into 10 and compare the lines

code without multiprocess

import time

result = []


def get_match(lines, num):
    outer_lines = lines[:num]
    inner_lines = lines[1:]
    for f1 in outer_lines:
        # print('f1', f1)
        for f2 in inner_lines:
            result.append(f1)
            result.append(f2)
            # print('f2', f2)
            # print('compare file line by line')
            # print('store int into global result variable')


if __name__ == '__main__':
    atime = time.time()

    split_n = 10

    with open('10000.txt', 'r') as file:

        line1 = file.readlines()
        line2 = line1[split_n:]
        line3 = line2[split_n:]
        line4 = line3[split_n:]
        line5 = line4[split_n:]
        line6 = line5[split_n:]
        line7 = line6[split_n:]
        line8 = line7[split_n:]
        line9 = line8[split_n:]
        line10 = line9[split_n:]

        t1 = get_match(line1,split_n,)
        t2 = get_match(line2,split_n,)
        t3 = get_match(line3,split_n,)
        t4 = get_match(line4,split_n,)
        t5 = get_match(line5,split_n,)
        t6 = get_match(line6,split_n,)
        t7 = get_match(line7,split_n,)
        t8 = get_match(line8,split_n,)
        t9 = get_match(line9,split_n,)
        t10 = get_match(line10,split_n,)

    btime = time.time()
    print(btime-atime)

code with multiprocess

from multiprocessing import Process
import time


result = []


def get_match(lines, num):
    outer_lines = lines[:num]
    inner_lines = lines[1:]
    for f1 in outer_lines:
        for f2 in inner_lines:
            result.append(f1)
            result.append(f2)


if __name__ == '__main__':
    atime = time.time()

    split_n = 10

    with open('test.txt', 'r') as file:
        line1 = file.readlines()
        line2 = line1[split_n:]
        line3 = line2[split_n:]
        line4 = line3[split_n:]
        line5 = line4[split_n:]
        line6 = line5[split_n:]
        line7 = line6[split_n:]
        line8 = line7[split_n:]
        line9 = line8[split_n:]
        line10 = line9[split_n:]

    p1 = Process(target=get_match, args=(line1, split_n, ))
    p1.start()
    p2 = Process(target=get_match, args=(line2, split_n,))
    p2.start()
    p3 = Process(target=get_match, args=(line3, split_n,))
    p3.start()
    p4 = Process(target=get_match, args=(line4, split_n,))
    p4.start()
    p5 = Process(target=get_match, args=(line5, split_n,))
    p5.start()
    p6 = Process(target=get_match, args=(line6, split_n,))
    p6.start()
    p7 = Process(target=get_match, args=(line7, split_n,))
    p7.start()
    p8 = Process(target=get_match, args=(line8, split_n,))
    p8.start()
    p9 = Process(target=get_match, args=(line9, split_n,))
    p9.start()
    p10 = Process(target=get_match, args=(line10, split_n,))
    p10.start()

    procs = [p1,p2,p3,p4,p5,p6,p7,p8,p9,p10]

    # complete the processes
    for proc in procs:
        proc.join()

    btime = time.time()
    print(btime-atime)

Working with processes doesn't mean you're multiprocessing. The way your code is written, you are just opening a process, waiting for it to return, then moving on to open the next process. In your second example, you're doing the same thing as in your first, but with the added overhead of opening and closing a new process every time.

If you want actual multiprocessing (ie everything done simultaneously) I'd recommend using map . see the documentation here: https://docs.python.org/2/library/multiprocessing.html

Create and destroy process cost many resources like cpu, memory and time, it is bad to use multiprocess if your data is not so big. I'd recommend use some science computer packages, such as numpy and scipy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM