简体   繁体   中英

'DataFrame' object has no attribute 'schema'

I'm trying to write the data in the existing zip file to hdfs in parquet format, but I encountered an error like this. I would be glad if you help. (By the way, I'm open to your ideas to make this code serve the same purpose in a different way)

import pandas as pd
import pyarrow.parquet as pq

file = c:/okay.log.gz
df = pd.read_csv(file, compression =gzip, low_memory=false, sep="|", error_badlines=False)
pq.write_table(df, "target_path")

AttributeError: 'DataFrame' object has no attribute 'schema'

I've just run into the same issue, but I assume you've resolved yours. In case you haven't or someone else comes across this with a similar issue, try creating a pyarrow table from the dataframe first.

import pyarrow as pa
import pyarrow.parquet as pq
    
df = {some dataframe}
table = pa.Table.from_pandas(df)
pq.write_table(table, '{path}')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM