繁体   English   中英

spark中的to_timestamp()函数给出了空值

[英]to_timestamp() function in spark is giving null values

所以我读了一个带有架构的 csv 文件:

mySchema = StructType([StructField("StartTime", StringType(), True),
                       StructField("EndTime", StringType(), True)])

data = spark.read.load('/mnt/Experiments/Bilal/myData.csv', format='csv', header='false', schema = mySchema)
data.show(truncate = False)

我明白了:

+---------------------------+---------------------------+
|StartTime                  |EndTime                    |
+---------------------------+---------------------------+
|2018-12-24T03:03:31.8088926|2018-12-24T03:07:35.2802489|
|2018-12-24T03:13:25.7756662|2018-12-24T03:18:10.1018656|
|2018-12-24T03:23:32.9391784|2018-12-24T03:27:57.2195314|
|2018-12-24T03:33:31.0793551|2018-12-24T03:37:04.6395942|
|2018-12-24T03:43:54.1638926|2018-12-24T03:46:38.1188857|
+---------------------------+---------------------------+

现在,当我使用以下方法将这些列从 stringtype 转换为 timestamptype 时:

data = data.withColumn('StartTime', to_timestamp('StartTime', "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"))
data = data.withColumn('EndTime', to_timestamp('EndTime', "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"))

我得到空值:

+---------+-------+
|StartTime|EndTime|
+---------+-------+
|null     |null   |
|null     |null   |
|null     |null   |
|null     |null   |
|null     |null   |
+---------+-------+

我能够通过铸造解决它。 奇怪的是它不需要格式。 (Spark 2.4.0。Windows 10 上的本地模式)
铸造前的模式。

df.printSchema()
root
 |-- StartTime: string (nullable = true)
 |-- EndTime: string (nullable = true)

from pyspark.sql import functions as F
df2 = df.withColumn('StartTime', F.col('StartTime').cast("timestamp")) \
.withColumn('EndTime', F.col('EndTime').cast("timestamp"))

结果

df2.show(truncate=False)
+--------------------------+--------------------------+
|StartTime                 |EndTime                   |
+--------------------------+--------------------------+
|2018-12-24 03:03:31.808892|2018-12-24 03:07:35.280248|
|2018-12-24 03:13:25.775666|2018-12-24 03:18:10.101865|
|2018-12-24 03:23:32.939178|2018-12-24 03:27:57.219531|
|2018-12-24 03:33:31.079355|2018-12-24 03:37:04.639594|
|2018-12-24 03:43:54.163892|2018-12-24 03:46:38.118885|
+--------------------------+--------------------------+

检查架构

df2.printSchema()
root
 |-- StartTime: timestamp (nullable = true)
 |-- EndTime: timestamp (nullable = true)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM