简体   繁体   中英

Convert a string to a timestamp object in Pyspark

I am trying to convert a string to a timestamp format in Pyspark.

from pyspark.sql.types import DateType

df = spark.createDataFrame([('28/Mar/2021:06:29:54 -0700',)], ['dt'])
df.select(date_format('dt', 'd/M/y:h:m:s Z').alias('date')).collect()

This does not seem to work, probably because the date_format function does not recognize this as a valid format. I get this:

[Row(date=None)]

Is there anyway I can make the pyspark functions understand the format doing something similar to the datetime module in Python?

from datetime import datetime
datetime.strptime('28/Mar/2021:06:29:54 -0700', '%d/%b/%Y:%H:%M:%S %z')

When we pass the format it creates a valid datetime object

datetime.datetime(2021, 3, 28, 6, 29, 54, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200)))

Unlike the Python datetime module, in Spark, you need to specify the number of characters for each pattern. Also, use to_timestamp to convert a string to a timestamp type. date_format is for the other way round, ie converting timestamp types to a string.

from pyspark.sql.functions import to_timestamp

df = spark.createDataFrame([('28/Mar/2021:06:29:54 -0700',)], ['dt'])
df.select(to_timestamp('dt', 'dd/MMM/yyyy:HH:mm:ss Z').alias('date')).collect()
# gives [Row(date=datetime.datetime(2021, 3, 28, 14, 29, 54))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM