How to remove __null_dask_index from parquet file?

Question

I am writing a df to a Parquet file using Dask :

df.to_parquet(file, compression='snappy', write_metadata_file=False,\
              engine='pyarrow', index=None)

I need to present the contents of the file in an online parquet viewer,

and the columns getting displayed are :

Column1  Column2  Column3  __null_dask_index__

How do I remove the __null_dask_index__ column?

Answer 1

The relevant kwarg here is write_index :

from dask.datasets import timeseries
from pyarrow.parquet import ParquetFile

df = timeseries(end='2000-01-03').reset_index()

for write_index in [True, False]:
    df.to_parquet('test.pqt', write_index=write_index)
    f = ParquetFile('test.pqt/part.0.parquet')
    print(f.schema.names)
# ['__null_dask_index__', 'timestamp', 'id', 'name', 'x', 'y']
# ['timestamp', 'id', 'name', 'x', 'y']

How to remove __null_dask_index from parquet file?

Question

1 answers

solution1
1 2022-07-07 03:48:00

How to remove __null_dask_index from parquet file?

Question

1 answers

solution1 1 2022-07-07 03:48:00

solution1
1 2022-07-07 03:48:00