简体   繁体   中英

Pyspark - Convert to Timestamp

Spark version: 2.1

I'm trying to convert a string datetime column to utc timestamp with the format yyyy-mm-ddThh:mm:ss

I first start by changing the format of the string column to yyyy-mm-ddThh:mm:ss and then convert it to timestamp type. Later I would convert the timestamp to UTC using to_utc_timestamp function.

df.select(
    f.to_timestamp(
        f.date_format(f.col("time"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss"
    )
).show(5, False)

The date_format works fine by giving me the correct format. But, when I do to_timestamp on top of that result, the format changes to yyyy-MM-dd HH:mm:ss , when it should instead be yyyy-MM-dd'T'HH:mm:ss . Why does this happen?

Could someone tell me how I could retain the format given by date_format? What should I do?

The function to_timestamp returns a string to a timestamp, with the format yyyy-MM-dd HH:mm:ss .

The second argument is used to define the format of the DateTime in the string you are trying to parse.

You can see a couple of examples in the official documentation .

在此处输入图像描述

The code should be like this, just look at the single 'd' part here, and this is tricky in many cases.

data= data.withColumn('date', to_timestamp(col('date'), 'yyyy/MM/d'))

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM