Will the GIL lock significantly decrease performance of the following code?
The function over each block uses a python loop instead of numpy function. I have to use a python loop because of an external library.
Test code:
import numpy as np
import dask.array as da
import dask.sharedict as sharedict
from itertools import product
def block_func(block):
for i in range(len(block)): # <--- the python loop ...
block[i] += 1
return block
def darr_func(x, name='test'):
dsk = {}
for idx in product(*map(range, x.numblocks)):
dsk[(name,) + idx] = (block_func, (x.name,) + idx)
dsk2 = sharedict.merge((name, dsk), x.dask)
return da.Array(dsk2, name, x.chunks, x.dtype)
def main():
n = 1000
chunks = 100
arr = np.arange(n*n).reshape(n, n)
darr = da.from_array(arr, chunks=chunks)
result = darr_func(darr)
print(result.compute())
main()
If that is the case, can setting the context for scheduler help? How to set context for a function over a dask array? I want to use the default dask scheduler for other operations over dask arrays.
From the wiki, I see ways to set scheduler for compute instead of a function:
# As a context manager
>>> with dask.set_options(get=dask.multiprocessing.get):
... x.sum().compute()
# Set globally
>>> dask.set_options(get=dask.multiprocessing.get)
>>> x.sum().compute()
Python for loops do not release the GIL and so are hard to parallelize with threads. In this case you have a few options
Use a scheduler that splits the computation out to multiple process. My personal recommendation is to use the dask.distributed scheduler locally, which can be done by running the following two lines:
from dask.distributed import Client client = Client()
However as always you should profile your code and try a few things. The advice given above depends on many factors. For example Python for loops may not be an issue if the body of the loop releases the GIL.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.