简体   繁体   中英

Need Help In Converting cuDF Dataframe to cupy ndarray

I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below:

import time
import numpy as np
import cupy as cp
import cudf
from numba import cuda
df = cudf.read_csv('titanic.csv')
arr_cupy = cp.fromDlpack(df.to_dlpack())

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-176-0d6ff9785189> in <module>
----> 1 arr_cupy = cp.fromDlpack(df.to_dlpack())

~/.conda/envs/rapids_013/lib/python3.7/site-packages/cudf/core/dataframe.py in to_dlpack(self)
   3821         import cudf.io.dlpack as dlpack
   3822 
-> 3823         return dlpack.to_dlpack(self)
   3824 
   3825     @ioutils.doc_to_csv()

~/.conda/envs/rapids_013/lib/python3.7/site-packages/cudf/io/dlpack.py in to_dlpack(cudf_obj)
     72         )
     73 
---> 74     return libdlpack.to_dlpack(gdf_cols)

cudf/_libxx/dlpack.pyx in cudf._libxx.dlpack.to_dlpack()

ValueError: Cannot create a DLPack tensor with null values.                     Input is required to have null count as zero.

I'm getting this error because dataset have nullvalues. How can I do this??

Let's cover your two issues:)

From cudf df to cupy ndarray: You can use to_gpu_matrix and cast it to a cupy array as below. This keeps it all on the GPU as is pretty efficient.

arr_cupy = cp.array(df.to_gpu_matrix())

https://docs.rapids.ai/api/cudf/stable/api.html#cudf.core.dataframe.DataFrame.to_gpu_matrix

In the future (or even present that I'm yet not aware of), there may be a more direct way. If for some reason you need DLPack, okay, your way works. That brings us to the second issue...

Null Values : to fill in your null values, you should use .fillna() . Use a value you you can tell is out of place. https://docs.rapids.ai/api/cudf/stable/api.html#cudf.core.dataframe.DataFrame.fillna

Together, they can look like this:

arr_cupy = cp.array(df.fillna(-1).to_gpu_matrix())

Output type is cupy.core.core.ndarray

Output array from my test df is:

array([[          0,    17444256,        1200],
       [          1,   616285571,         987],
       [          2,          -1,         407],
       ...,

where -1 is the null i artificially created

Hope that helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM