Problem: The DASK dataframe
loc[concrete_row, concrete_column]
return pandas data frame with multiple rows, each with the same index:
0 [1,2,3]
0 [1,2]
0 [3]
instead of one row value.
0 [1,2,3]
I am reading many parquet files:
dd.read_parquet(dataset_dir+'/train/date*/*.parquet')
Each row in parquet file has a array!!!
I need to call map function for each row and get iterable values of this concrete row. How to i resolve it?
I need to call map function for each row and get iterable values of this concrete row.
It sounds like you want the map or apply methods.
def func(row):
return ...
result = df.apply(func)
In general parallel computing tools like Dask are poorly suited to get data one row at a time. Instead it's common to apply a function across all of your rows in parallel.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.