简体   繁体   中英

Python MultiProcessing Pool Slower Than Sequential?

So I've been playing around with the multiprocessing module, trying to figure out ways to speed up a lot of the work I do with pandas DataFrames.

The example I was working with was taking a sequence of Excel files, each one representing a years worth of data, turning them into a dataframe and then summing one of the columns. Sequentially, something like this:

now = time.time()
dict = {}
table_2010 = pd.read_excel('2010.xlsx')
table_2011 = pd.read_excel('2011.xlsx')
table_2012 = pd.read_excel('2012.xlsx')
table_2013 = pd.read_excel('2013.xlsx')
table_2014 = pd.read_excel('2014.xlsx')
table_2015 = pd.read_excel('2015.xlsx')
dict[2011] = table_2011[[95]].sum()
dict[2010] = table_2010[[95]].sum()
dict[2012] = table_2012[[95]].sum()
dict[2013] = table_2013[[95]].sum()
dict[2014] = table_2014[[95]].sum()
dict[2015] = table_2015[[95]].sum()
print dict
print time.time() - now

This took me 205 seconds , the Excel files are sizable and take a while to load into a dataframe, and I assumed that running it in parallel would improve that performance. What I came up with was this:

def func(year):
    table = pd.read_excel(str(year) + '.xlsx')
    dict[year] = table[[95]].sum()

if __name__ == '__main__':
    now = time.time()
    dict = {}
    pool = ThreadPool(8)
    pool.map_async(func, [2010,2011,2012,2013,2014,2015])
    pool.close()
    pool.join()
    print dict
    print time.time() - now

When I ran this though, it ended up taking 250 seconds . It was my impression that having separate cores run each of these processes would improve performance, is that incorrect?

Or is there an issue with the script I wrote?

Slower?
Depends.
Depends, a lot .

Is there an issue with the script?
Yes, a brute one ( still no need to worry or panic - a well solvable one ). Enjoy the read.

# =========================================================================[sec]
  an-<iterator>-based SERIAL processing of 9 CPU-bound tasks took 1290.538 [sec]
 aThreadPool(6)-based  TPOOL processing of 9 CPU-bound tasks took 1212.065 [sec]
       aPool(6)-based   POOL processing of 9 CPU-bound tasks took  271.765 [sec]
# =========================================================================[sec]

Python multiprocessing has several Pool -s

Based on not a fully documented MCVE-above ( missing all explicit namespace import -s to safely disambiguate the setup for an intended use-case), let's start with your code mentioning ThreadPool.map_async() and processing many Excel files.

One could hardly start a worse approach for the intended fast processing.


Is a Pool slower than SEQ ?

Having purposely borrowed a natively-parallel occam -language syntax, the question tends to lead to the pain of a PAR | SEQ dilemma in designing high-performance systems ( Yes, HPC, sure - guess, who would ever like to design slow systems on purpose, right? ).

The issue is sufficiently multi-fold to ask even more questions before being able to seriously approach the answer to the initial dilemma.

What resources do we have?

What type of operations are to be executed in a PAR | SEQ arrangement:

  • is the problem purely { CPU-bound | IO-bound }?

  • is the problem in a need to share-{ state | data } during processing?

  • is the problem in a need to communicate-{ signals | messages } during processing?


Let's start with prototyped use-cases:

A CPU-bound processing is much simpler to mock-up ( and a way "greener" to wear & tear precious physical HPC resources ), so let's start to use a primitive function:

def aFatCALCULUS( id ):                         # an INTENSIVE CPU-bound WORKLOAD
    import numpy as np
    import os
    pass;   aST = "aFatCALCULUS( {1:>3d} ) [PID:: {0:d}] RET'd {2:d}"
    return( aST.format( os.getpid(),
                        id,
                        id + len( str( [ np.math.factorial( 2**f ) for f in range( 20 ) ][-1] ) )
                        )
            )

Now, let's execute this one several times, in different arrangements.

Forgive my non-PEP-8 format ( we do not sponsor any core refactoring with the demonstrations made here, so indeed nobody serious shall ever feel this choice inappropriate in whatever sense ).

from multiprocessing.pool import ThreadPool                             # ThreadPool-mode
from multiprocessing      import Pool                                   #       Pool-mode
pass;                     import time
print( "{0:}----------------------------------------------------------- # SETUP:".format(                                   time.ctime() ) )
aListOfTaskIdNUMBERs = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, ]

print( "{0:}----------------------------------------------------------- # PROCESSING-ThreadPool mode of EXECUTION:".format( time.ctime() ) )
aTPool = ThreadPool( 6 )                                                # PROCESSING-ThreadPool.capacity == 6

print( "{0:}----------------------------------------------------------- # SERIAL mode of EXECUTION:".format( time.ctime() ) )
start  = time.clock_gettime( time.CLOCK_MONOTONIC_RAW );
pass;                                                  [ aFatCALCULUS( id ) for id in aListOfTaskIdNUMBERs ] # SERIAL <iterator>-driven mode of EXECUTION
pass;                                                                   duration = time.clock_gettime( time.CLOCK_MONOTONIC_RAW ) - start; print( "an-<iterator>-based SERIAL processing of {1:}-tasks took {0:} [sec]".format( duration, len( aListOfTaskIdNUMBERs ) ) )
pass;

print( "{0:}----------------------------------------------------------- # PROCESSING-Pool mode of EXECUTION:".format( time.ctime() ) )
aPool  = Pool( 6 )                                                      # PROCESSING-Pool.capacity == 6
start  = time.clock_gettime( time.CLOCK_MONOTONIC_RAW );
pass;                                                 aPool.map( aFatCALCULUS, aListOfTaskIdNUMBERs )        # PROCESSING-Pool-driven mode of EXECUTION
pass;                                                                   duration = time.clock_gettime( time.CLOCK_MONOTONIC_RAW ) - start; print( "aPool(6)-based processing of {1:}-tasks took {0:} [sec]".format( duration, len( aListOfTaskIdNUMBERs ) ) )
print( "{0:}----------------------------------------------------------- # END.".format( time.ctime() ) )

Why? . . . just check the PID #s

aPool(6).map()
["aFatCALCULUS(   1 ) [PID:: 898] RET'd 2771011",
 "aFatCALCULUS(   2 ) [PID:: 899] RET'd 2771012",
 "aFatCALCULUS(   3 ) [PID:: 900] RET'd 2771013",
 "aFatCALCULUS(   4 ) [PID:: 901] RET'd 2771014",
 "aFatCALCULUS(   5 ) [PID:: 902] RET'd 2771015",
 "aFatCALCULUS(   6 ) [PID:: 903] RET'd 2771016",
 "aFatCALCULUS(   7 ) [PID:: 898] RET'd 2771017",
 "aFatCALCULUS(   8 ) [PID:: 899] RET'd 2771018",
 "aFatCALCULUS(   9 ) [PID:: 903] RET'd 2771019"
 ]

aThreadPool(6)
["aFatCALCULUS(   1 ) [PID:: 16125] RET'd 2771011",
 "aFatCALCULUS(   2 ) [PID:: 16125] RET'd 2771012",
 "aFatCALCULUS(   3 ) [PID:: 16125] RET'd 2771013",
 "aFatCALCULUS(   4 ) [PID:: 16125] RET'd 2771014",
 "aFatCALCULUS(   5 ) [PID:: 16125] RET'd 2771015",
 "aFatCALCULUS(   6 ) [PID:: 16125] RET'd 2771016",
 "aFatCALCULUS(   7 ) [PID:: 16125] RET'd 2771017",
 "aFatCALCULUS(   8 ) [PID:: 16125] RET'd 2771018",
 "aFatCALCULUS(   9 ) [PID:: 16125] RET'd 2771019"
 ]

How to best orchestrate the IO-bound case?


A supercomputer turns compute-bound problems into I/O bound problems ( S. Cray )


Obey Seymour CRAY's wisdom with all due humility,
but
do not let others make you the one,
who pays the costs of their missing HPC duties
on your side of the CPU-budget.

IMHO, if this were my HPC-task,
I would

  • avoid paying pandas XLSX-import / transformation costs

  • make Excel-data- owner / processor to guarrantee and enforce their automated column- SUM() -{ auto | manual | scripted }- update on each data-element change/update arriving in time, be it in batch or by an event, on their data-store side

  • go for the fastest ( distributed-processing ) architecture with the powers of independent multiprocessing.Pool().map() processes not reading ( == moving all heaps of data ) but using a smart, direct-access, just to cells ( == elements ) you need to process.

Here,
everyone can see and smell
why a PAR arranged Pool() is faster than any other SEQ processing:

'''     REAL SYSTEM:: multiprocessing.Pool(6).map()
_________________________________________________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________________________________________________

top - 22:24:42 up 84 days, 23:05,  4 users,  load average: 4.80, 2.17, 0.86
Threads: 366 total,   5 running, 361 sleeping,   0 stopped,   0 zombie
%Cpu0  :  75.7/0.0    76[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                        ]
%Cpu1  :   0.1/0.0     0[                                                                                                    ]
%Cpu2  :   0.0/0.0     0[                                                                                                    ]
%Cpu3  : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu4  :   0.1/0.0     0[                                                                                                    ]
%Cpu5  :   0.0/0.0     0[                                                                                                    ]
%Cpu6  :  76.2/0.0    76[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                        ]
%Cpu7  : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu8  :   0.0/0.0     0[                                                                                                    ]
%Cpu9  :  75.5/0.0    76[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                        ]
%Cpu10 :   0.5/0.4     1[                                                                                                    ]
%Cpu11 : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu12 :   0.0/0.0     0[                                                                                                    ]
%Cpu13 :   0.0/0.0     0[                                                                                                    ]
%Cpu14 :   0.0/0.0     0[                                                                                                    ]
%Cpu15 :   0.7/0.5     1[||                                                                                                  ]
%Cpu16 :   0.0/0.0     0[                                                                                                    ]
%Cpu17 :   0.0/0.0     0[                                                                                                    ]
%Cpu18 :   0.0/0.0     0[                                                                                                    ]
%Cpu19 :   0.0/0.0     0[                                                                                                    ]
KiB Mem : 24522940 total, 22070528 free,   778080 used,  1674332 buff/cache
KiB Swap:  8257532 total,  7419136 free,   838396 used. 22905264 avail Mem

 P S %CPU  PPID   PID nTH     TIME+ USER      PR  NI    RES   CODE    SHR    DATA %MEM    VIRT vMj vMn   SWAP      nsIPC COMMAND
 1 S  0.0  1614  1670   1  10:54.15 root      20   0    632    740    416    1396  0.0   52140   0   0   1172 -                   `- haproxy
 2 S  0.0  1614  1671   1  35:40.50 root      20   0    664    740    380    1528  0.0   52272   0   0   1172 -                   `- haproxy
19 S  0.0     1  1658   1   6:20.42 root      20   0  22960    468  14380   14240  0.1  466344   0   0    836 -           `- httpd
12 S  0.0     1 24217   1   4:31.41 root      20   0   3984      8    668    7320  0.0  155304   0   0   3864 -           `- munin-node
 0 R  0.0 12882  4964   1   0:31.53 m         20   0   2596     96   1524    1596  0.0  158096   0   0      0 4026531839                  `- top
 0 S  0.1 15213 22779  22   0:11.16 m         20   0  54052   2268   5816 1965528  0.2 2191768   0   0      0 4026531839                      `- python3
 1 S  0.1 15213 23613  22   0:10.83 m         20   0  54052   2268   5816 1965528  0.2 2191768   0   0      0 4026531839                      `- python3
 7 R 99.9 16125   898   1   2:29.72 m         20   0  52084   2268   1336 1969112  0.2 2195352   0  3k      0 4026531839                      `- python3
11 R 99.9 16125   899   1   2:29.72 m         20   0  52088   2268   1336 1969116  0.2 2195356   0  3k      0 4026531839                      `- python3
 6 S 76.3 16125   900   1   2:15.49 m         20   0  49724   2268   1236 1965520  0.2 2191760   0 777      0 4026531839                      `- python3
 0 S 75.7 16125   901   1   2:15.12 m         20   0  49732   2268   1236 1965524  0.2 2191764   0 775      0 4026531839                      `- python3
 9 S 75.6 16125   902   1   2:15.05 m         20   0  49732   2268   1236 1965524  0.2 2191764   0 775      0 4026531839                      `- python3
 3 R 99.9 16125   903   1   2:29.70 m         20   0  52100   2268   1336 1969120  0.2 2195360   0  3k      0 4026531839                      `- python3
 4 S  0.1 15213   904  22   0:00.36 m         20   0  54052   2268   5816 1965528  0.2 2191768   0   0      0 4026531839                      `- python3
15 S  1.2 19285 21279   2  21:27.31 a         20   0  75720   2268  12940  196868  0.3  642876   0   0      0 -                       `- python3
 8 S  0.0 19285 21281   2   0:14.88 a         20   0  75720   2268  12940  196868  0.3  642876   0   0      0 -                           `- python3
10 S  0.9 22118 22120   2  20:07.34 a         20   0  56604   2268   7176  464808  0.2  722164   0   0      0 -                       `- python3
 4 S  0.0 22118 22122   2   0:19.39 a         20   0  56604   2268   7176  464808  0.2  722164   0   0      0 -                           `- python3
 4 S  0.0     2    29   1  33:46.57 root      20   0      0      0      0       0  0.0       0   0   0      0 -           `- rcu_sched
_________________________________________________________________________________________________________________________________________________________

_________________________________________________________________________________________________________________________________________________________
top - 22:25:31 up 84 days, 23:06,  4 users,  load average: 3.78, 2.30, 0.97
Threads: 365 total,   4 running, 361 sleeping,   0 stopped,   0 zombie
%Cpu0  :   0.2/0.4     1[                                                                                                    ]
%Cpu1  :   0.0/0.0     0[                                                                                                    ]
%Cpu2  :   0.0/0.0     0[                                                                                                    ]
%Cpu3  : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu4  :   0.2/0.0     0[                                                                                                    ]
%Cpu5  :   0.0/0.0     0[                                                                                                    ]
%Cpu6  :   0.0/0.0     0[                                                                                                    ]
%Cpu7  : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu8  :   0.0/0.0     0[                                                                                                    ]
%Cpu9  :   0.0/0.0     0[                                                                                                    ]
%Cpu10 :   0.6/0.4     1[|                                                                                                   ]
%Cpu11 : 100.0/0.0   100[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu12 :   0.0/0.0     0[                                                                                                    ]
%Cpu13 :   0.0/0.0     0[                                                                                                    ]
%Cpu14 :   0.0/0.0     0[                                                                                                    ]
%Cpu15 :   0.6/0.6     1[||                                                                                                  ]
%Cpu16 :   0.0/0.0     0[                                                                                                    ]
%Cpu17 :   0.0/0.0     0[                                                                                                    ]
%Cpu18 :   0.0/0.0     0[                                                                                                    ]
%Cpu19 :   0.0/0.0     0[                                                                                                    ]
KiB Mem : 24522940 total, 22076660 free,   772436 used,  1673844 buff/cache
KiB Swap:  8257532 total,  7419136 free,   838396 used. 22911364 avail Mem

 P S %CPU  PPID   PID nTH     TIME+ USER      PR  NI    RES   CODE    SHR    DATA %MEM    VIRT vMj vMn   SWAP      nsIPC COMMAND
 2 S  0.2  1614  1671   1  35:40.51 root      20   0    664    740    380    1528  0.0   52272   0   0   1172 -                   `- haproxy
 0 R  0.4 12882  4964   1   0:31.66 m         20   0   2596     96   1524    1596  0.0  158096   0   0      0 4026531839                  `- top
 7 R 99.9 16125   898   1   3:18.45 m         20   0  52608   2268   1336 1969112  0.2 2195352   0   9      0 4026531839                      `- python3
11 R 99.9 16125   899   1   3:18.46 m         20   0  52612   2268   1336 1969116  0.2 2195356   0   9      0 4026531839                      `- python3
 3 R 99.9 16125   903   1   3:18.43 m         20   0  52624   2268   1336 1969120  0.2 2195360   0  10      0 4026531839                      `- python3
15 S  1.2 19285 21279   2  21:27.92 a         20   0  75720   2268  12940  196868  0.3  642876   0   0      0 -                       `- python3
10 S  1.0 22118 22120   2  20:07.81 a         20   0  56604   2268   7176  464808  0.2  722164   0   0      0 -                       `- python3
_________________________________________________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________________________________________________
'''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM