I am trying to speed up my code by using multiprocessing with Python. The only problem I ran into when trying to implement multiprocessing was that my function has a return statement and I needed to save that data to a list. The best way I found using google was to use queue as "q.put()" and retrieve it using "q.get()". The only issue is that I think i'm not utilizing this the right way because when I use command prompt after compiling, it shows i'm hardly using my cpu and I only see one Python process running. If I remove "q.get()" the process is super fast and utilizes my cpu. Am I doing this the right way?
import time
import numpy as np
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
def test(x,y,q):
q.put(x * y)
if __name__ == '__main__':
q = Queue()
one = []
two = []
three = []
start_time = time.time()
for x in np.arange(30, 60, 1):
for y in np.arange(0.01, 2, 0.5):
p = multiprocessing.Process(target=test, args=(x, y, q))
p.start()
one.append(q.get())
two.append(int(x))
three.append(float(y))
print(x, ' | ', y, ' | ', one[-1])
p.join()
print("--- %s seconds ---" % (time.time() - start_time))
d = {'x' : one, 'y': two, 'q' : three}
data = pd.DataFrame(d)
print(data.tail())
No, this is not correct. You start a process and wait for the result through q.get
at once. Therefore only one process running at the same time. If you want to operate on many tasks, use multiprocessing.Pool
:
import time
import numpy as np
from multiprocessing import Pool
from itertools import product
def test((x,y)):
return x, y, x * y
def main():
start_time = time.time()
pool = Pool()
result = pool.map(test, product(np.arange(30, 60, 1), np.arange(0.01, 2, 0.5)))
pool.close()
print("--- %s seconds ---" % (time.time() - start_time))
print(result)
if __name__ == '__main__':
main()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.