简体   繁体   中英

Very slow communication between Python and C++ program with subprocess

I am trying to speed up a Python program by outsourcing some repetitive calculations to a C++ program by using subprocess Python module.

To illustrate my problem, I took a simple C++ code that return the double of the input. It takes 16 seconds for a million of integers, which seems very slow.

Here is the C++ program (double.exe) :

#include <iostream>

using namespace std;

int main()
{
    int a;
    bool goon = true;
    while (goon)
    {
        cin >> a;
        cout << 2 * a << endl;
        if (a == 0)
            goon= false;
    }
}

And here the Python 3 code :

from time import time
from subprocess import PIPE,Popen

cmd = ["double"]
process = Popen(cmd, stdin=PIPE,stdout=PIPE, bufsize=32,universal_newlines=True, shell=True)

t0 = time()
for i in range(1,int(1e6)):
    print(i, file=process.stdin, flush=True)
    output = int(process.stdout.readline())
dt = time() - t0
print("Time to communicate : %fs" % dt)
print(0,file=process.stdin,flush=True) # close 'double' program

Time to communicate : 16.029137s

To me the reason why it's so slow can only be the communication between the Python process and the C++ program through the pipes, but I haven't find how to accelerate it. Any solution to speed up this communication, with subprocess or an other library ?

I'm using Python 3.5.2 on Windows.

The problem is not stdin communication per se, but rather massive context switching. You do a very small "task" in the C++ code, but for each such "task" the python code should write data to pipe, flush, go asleep, the C++ part wakes up, parses the input, calculates the result, prints the output, flushes and goes asleep. Then the python code wakes up, etc.

Going to sleep and waking up (and associated context switching) is not free. With the size of a "task" (multiplying the input by two) this overhead consumes most of the time.

You can "fix" that by either supplying work to the C++ program in batches, or having bigger tasks. Or both.

For example, the same job with million of numbers, but done using batches of 10 numbers runs 2 times faster on my box if the pipe is flushed after each write. The code:

for i in range(1,int(1e5)):
    for j in range(1, 10):
        print(i*10 + j, file=process.stdin, flush=True)
    for j in range(1, 10):
        output = int(process.stdout.readline())

If the flush is done only once per 10 numbers it runs 1.5 times faster than the previous example (or 3 times faster than the original code):

for i in range(1,int(1e5)):
    for j in range(1, 10):
        print(i*10 + j, file=process.stdin)
    process.stdin.flush()
    for j in range(1, 10):
        output = int(process.stdout.readline())

If the "task" is bigger then the price you have to pay for the context switch is the same. But it's not as big compared to the size of the task. For example, let's imagine that the context switch takes 0.1 second (it's way smaller in the real life, this is just an example). If the task is a multiplication which is done in say 1ms (again, just for example) then the context switch overhead compared to the task is 10000%. But if your task is heavy and takes 1s to be performed, then the overhead is just 10%. 1000 times difference in relative value.

Just a guess, but it may be due to the fact that std::endl not only writes a new line character but also flushes the output stream. The flush miight be the part that takes the most time. Therefore it might be faster if you just write

std::cout << 2 * a << "\n"; //Unix style line break

or

std::cout << 2 * a << "\r\n"; //Windows style line break

(Note: Untested whether this works or the implicit flush is actually required to be there.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM