简体   繁体   中英

Unexpected Behavior from Python Multiprocessing Pool Class

I am trying to utilize Python's multiprocessing library to quickly run a function using the 8 processing cores I have on a Linux VM I created. As a test, I am getting the time in seconds it takes for a worker pool with 4 processes to run a function, and the time it takes running the same function without using a worker pool. The time in seconds is coming out as about the same, in some case it is taking the worker pool much longer to process than without.

Script

import requests
import datetime
import multiprocessing as mp

shared_results = []

def stress_test_url(url):
    print('Starting Stress Test')
    count = 0

    while count <= 200:
        response = requests.get(url)
        shared_results.append(response.status_code)
        count += 1

pool = mp.Pool(processes=4)

now = datetime.datetime.now()
results = pool.apply(stress_test_url, args=(url,))
diff = (datetime.datetime.now() - now).total_seconds()

now = datetime.datetime.now()
results = stress_test_url(url)
diff2 = (datetime.datetime.now() - now).total_seconds()

print(diff)
print(diff2)

Terminal Output

Starting Stress Test
Starting Stress Test
44.316212
41.874116

The apply function of multiprocessing.Pool simply runs a function in a separate process and waits for its results. It takes a little bit more than running sequentially as it needs to pack the job to be processed and ship it to the child process via a pipe .

multiprocessing doesn't make sequential operations faster, it simply allows them to be run in parallel if you hardware has more than one core.

Just try this:

urls = ["http://google.com", 
        "http://example.com", 
        "http://stackoverflow.com", 
        "http://python.org"]

results = pool.map(stress_test_url, urls)

You will see that the 4 URLs get visited seemingly at the same time. This means your logic reduces the amount of time necessary to visit N websites to N / processes .

Lastly, benchmarking a function which performs an HTTP request is a very poor way to measure performance as networks are unreliable. You will hardly get two executions which take the same amount of time no matter whether you use multiprocessing or not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM