简体   繁体   English

如何将 PySpark/Pandas 数据框中的日期/自定义/通用格式的列值转换为日期格式?

[英]How to convert column values present in date/custom/general formats in a PySpark/Pandas dataframe into a Date Format?

I have a dataframe which has a column consisting of Date values present in multiple formats (shown as custom/general/date in Excel) like what you can see in the "Before" column below:我有一个数据框,其中有一列包含多种格式的日期值(在 Excel 中显示为自定义/常规/日期),就像您在下面的“之前”列中看到的一样:

在此处输入图片说明

They all are originally date values only, but have somewhere got modified into different formats in the input CSV files that I got.它们最初都只是日期值,但在我得到的输入 CSV 文件中的某个地方被修改为不同的格式。

My objective is to convert the values into "DD/MM/YYYY" format that you can see in the "After" column.我的目标是将值转换为您可以在“之后”列中看到的“DD/MM/YYYY”格式。

I'm reading this as a Pandas/Spark dataframe since there are thousands of such values in the column for which this change has to be effected.我正在将此作为 Pandas/Spark 数据框阅读,因为必须对其进行此更改的列中有数千个此类值。

I tried doing the following, but it isn't resulting into the correct "DD/MM/YYYY" format that I need.我尝试执行以下操作,但没有得到我需要的正确“DD/MM/YYYY”格式。 Moreover, some of the values are remaining unchanged too:此外,一些值也保持不变:

df['After'] = pd.to_datetime(df['Before'], format='%d-%m-%y, errors='coerce')

Can anyone please help with how to go about this?任何人都可以帮忙解决这个问题吗?

Cheers!干杯!

Here is my try:这是我的尝试:

df = pd.read_excel('test.xls')

df['ADATE'] = pd.to_datetime(pd.to_numeric(df['A'],errors='coerce'), unit='D', origin='1899-12-30').fillna(pd.to_datetime(df['A'],errors='coerce'))


print(df)

output (how last line will be readed???)输出(如何读取最后一行???)

             A                   ADATE
0    43746.39028 2019-10-08 09:22:00.192
1          43735 2019-09-27 00:00:00.000
2  1/1/2021 0:00 2021-01-01 00:00:00.000
3        50:11.0                     NaT

EDIT编辑

Using ".dt.strftime('%d/%m/%Y')" you use the desired format, remembering that this will change the type.使用 ".dt.strftime('%d/%m/%Y')" 使用所需的格式,记住这会改变类型。

df['ADATE'] = pd.to_datetime(pd.to_numeric(df['A'],errors='coerce'), unit='D', origin='1899-12-30').fillna(pd.to_datetime(df['A'],errors='coerce')).dt.strftime('%d/%m/%Y')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 pandas dataframe 列中的两种不同日期格式转换为相同格式? - How to convert two different date formats from a pandas dataframe column into same format? Dataframe 具有不同格式的日期值的字符串列。 如何将整列数据转换为单个日期格式? - Dataframe has string column with date values in different formats. how to convert entire column data to single date format? 转换熊猫中的自定义日期格式 - Convert custom date formats in pandas 如何将数据框的日期和时间列转换为熊猫的datetime格式? - how to convert Date and time column of dataframe to datetime format of pandas? 如何在pandas中以一种格式转换多种日期格式 - how to convert multiple date formats in one format in pandas 转换 pandas 中的日期格式 - Convert Date Formats in pandas 将字符串列直接转换为 Pandas DataFrame 中的日期格式(不是日期时间) - Convert String Column directly to Date format (not Datetime) in Pandas DataFrame 如何在 PySpark Dataframe 列中将日期转换为月份的第一天? - How to convert date to the first day of month in a PySpark Dataframe column? 在熊猫数据框中转换日期格式 - Converting date formats in pandas dataframe 将带有日期列的 Pandas dataframe 转换为 Vaex dataframe - Convert a Pandas dataframe with a date column to a Vaex dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM