简体   繁体   中英

PyArrow / Dask to_parquet partition all null columns

When writing Dask dataframe partitions to parquet I've noticed that reading_parquet fails on conflicting meta data / schemas. This is because in some of the partitions column(s) are entirely null / np.nan and in others they are filled with values.

Beforehand I've casted the data types of my partitions:

df = df.astype(dtypes)

PyArrow fails to read my partitioned parquet files, because columns with only nulls are reassigned with datatype 'null'. How do I tackle this issue? Some of the partitions have columns with all nulls, while in others they are not entirely null.

Data types of columns are either integer, float or strings (object).

我建议在Dask或Arrow问题追踪器上提出问题

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM