简体   繁体   中英

How to create a variable using more than one conditional in dask?

I would like to apply something similar to np.select but using dask functions or attributes.

So I am assuming that you are not just using Dask, but Dask-Dataframes. If you look at the documentation here: https://docs.dask.org/en/latest/dataframe.html you will see that indexing over a Dask Series is considered to be fast. So an example like

dd[dd.x>3] 

Should work perfectly. Considering that basically uses a boolean indicator to select the index, we can extend the example by using the multiplication. True is represented by a 1 and False by a zero, which means that a multiplication of True * True will always equal 1, whilst False * True , True * False and False * False will yield a zero.

dd[(dd.x>3)*(dd.y<10)]

Should therefore give you the functionality that you are looking for.

Please note that when you are using Dask DataFrames, that the actual result will only be produced on request. So chain .compute behind your statement if you want to run the calculations like so

dd[(dd.x>3)*(dd.y<10)].compute()

I hope this helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM