简体   繁体   中英

to_timestamp() function in spark is giving null values

So I read a csv file with schema:

mySchema = StructType([StructField("StartTime", StringType(), True),
                       StructField("EndTime", StringType(), True)])

data = spark.read.load('/mnt/Experiments/Bilal/myData.csv', format='csv', header='false', schema = mySchema)
data.show(truncate = False)

I get this:

|StartTime                  |EndTime                    |

Now when I convert these columns from stringtype to timestamptype using:

data = data.withColumn('StartTime', to_timestamp('StartTime', "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"))
data = data.withColumn('EndTime', to_timestamp('EndTime', "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"))

I get null values:

|null     |null   |
|null     |null   |
|null     |null   |
|null     |null   |
|null     |null   |

I was able to solve it by casting. Strangely It did not need format. (Spark 2.4.0. Local mode on Windows 10)
The schema before casting.

 |-- StartTime: string (nullable = true)
 |-- EndTime: string (nullable = true)

from pyspark.sql import functions as F
df2 = df.withColumn('StartTime', F.col('StartTime').cast("timestamp")) \
.withColumn('EndTime', F.col('EndTime').cast("timestamp"))


|StartTime                 |EndTime                   |
|2018-12-24 03:03:31.808892|2018-12-24 03:07:35.280248|
|2018-12-24 03:13:25.775666|2018-12-24 03:18:10.101865|
|2018-12-24 03:23:32.939178|2018-12-24 03:27:57.219531|
|2018-12-24 03:33:31.079355|2018-12-24 03:37:04.639594|
|2018-12-24 03:43:54.163892|2018-12-24 03:46:38.118885|

Check the schema

 |-- StartTime: timestamp (nullable = true)
 |-- EndTime: timestamp (nullable = true)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM