I have a Dataframe in which I have a string column 'exam_date' is in YYYYMMDD format. for example 20201130
I have a requirement where I have to convert this Dataframe to parquet and upload it however while uploading I want the schema type of the column to be DATE
The java -jar parquet-tools.jar schema myfile.parquet
command should show the type as
optional int32 exam_date (DATE);
I have tried converting the column to DateTime type like this
final_calc_df['exam_date'] = pd.to_datetime(final_calc_df['exam_date'],format='%Y%m%d')
.dt.strftime('%Y%m%d')
However, this gives me the output like
optional binary exam_date (STRING);
What should I do to get the desired output?
What I want is to keep data in YYYYMMDD format but instead of string/datetime/binary type I want it to be date type Date
optional int32 exam_date (DATE);
It should work if you convert the column to datetime.date:
df = pd.DataFrame({'a': ['20211011']})
from datetime import datetime
def to_date(s):
return datetime.strptime(s, '%Y%m%d').date()
df['a'] = df['a'].map(to_date)
or simpler but possibly less efficient:
df['a'] = pd.to_datetime(df['a'], format='%Y%m%d').dt.date
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.