I would like to apply something similar to np.select but using dask functions or attributes.
So I am assuming that you are not just using Dask, but Dask-Dataframes. If you look at the documentation here: https://docs.dask.org/en/latest/dataframe.html you will see that indexing over a Dask Series is considered to be fast. So an example like
dd[dd.x>3]
Should work perfectly. Considering that basically uses a boolean indicator to select the index, we can extend the example by using the multiplication. True is represented by a 1 and False by a zero, which means that a multiplication of True * True
will always equal 1, whilst False * True
, True * False
and False * False
will yield a zero.
dd[(dd.x>3)*(dd.y<10)]
Should therefore give you the functionality that you are looking for.
Please note that when you are using Dask DataFrames, that the actual result will only be produced on request. So chain .compute
behind your statement if you want to run the calculations like so
dd[(dd.x>3)*(dd.y<10)].compute()
I hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.