[英]How to remove __null_dask_index from parquet file?
I am writing a df to a Parquet file using Dask :我正在使用Dask将 df 写入Parquet文件:
df.to_parquet(file, compression='snappy', write_metadata_file=False,\
engine='pyarrow', index=None)
I need to present the contents of the file in an online parquet viewer,我需要在在线拼花地板查看器中显示文件的内容,
and the columns getting displayed are :并且显示的列是:
Column1 Column2 Column3 __null_dask_index__
How do I remove the __null_dask_index__
column?如何删除
__null_dask_index__
列?
The relevant kwarg here is write_index
:这里的相关 kwarg 是
write_index
:
from dask.datasets import timeseries
from pyarrow.parquet import ParquetFile
df = timeseries(end='2000-01-03').reset_index()
for write_index in [True, False]:
df.to_parquet('test.pqt', write_index=write_index)
f = ParquetFile('test.pqt/part.0.parquet')
print(f.schema.names)
# ['__null_dask_index__', 'timestamp', 'id', 'name', 'x', 'y']
# ['timestamp', 'id', 'name', 'x', 'y']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.