简体   繁体   中英

Convert dataframe string/datetime coloumn to parquet date

I have a Dataframe in which I have a string column 'exam_date' is in YYYYMMDD format. for example 20201130

I have a requirement where I have to convert this Dataframe to parquet and upload it however while uploading I want the schema type of the column to be DATE

The java -jar parquet-tools.jar schema myfile.parquet command should show the type as

optional int32 exam_date (DATE);

I have tried converting the column to DateTime type like this

final_calc_df['exam_date'] = pd.to_datetime(final_calc_df['exam_date'],format='%Y%m%d')
                                           .dt.strftime('%Y%m%d')

However, this gives me the output like

optional binary exam_date (STRING);

What should I do to get the desired output?

What I want is to keep data in YYYYMMDD format but instead of string/datetime/binary type I want it to be date type Date

optional int32 exam_date (DATE);

It should work if you convert the column to datetime.date:

df = pd.DataFrame({'a': ['20211011']})
from datetime import datetime
def to_date(s):
  return datetime.strptime(s, '%Y%m%d').date()
df['a'] = df['a'].map(to_date)

or simpler but possibly less efficient:

df['a'] = pd.to_datetime(df['a'], format='%Y%m%d').dt.date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM