Spark Scala 从日期创建时间戳列

Question

I have a "Date" column that is a String in a Spark DF in this format 1/1/2000 12:53 AM, 1/1/2000 2:53 AM, 1/1/2000 5:53 AM, ... I am trying to create a new column that converts this column into a Unix Timestamp but getting a column full of null as my output.我有一个“日期”列，它是 Spark DF 中的字符串，格式为 1/1/2000 12:53 AM、1/1/2000 2:53 AM、1/1/2000 5:53 AM，.. . 我正在尝试创建一个新列，将该列转换为 Unix 时间戳，但得到一个充满 null 的列作为我的 output。 The line I am using to create this column is:我用来创建此列的行是：

val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a")) val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a"))

I created the Date column by concatenating separate Month, Day, Year, and Time columns but the Month and Day columns have input data in the form of 1 instead of 01 for Month and Day.我通过连接单独的月、日、年和时间列来创建日期列，但月和日列的输入数据为 1 而不是月和日的 01。 Is this why I'm getting a null column back or is there another reason?这就是为什么我要返回 null 列还是有其他原因？ And if this is the reason then how do I fix the day and month columns from 1 to 01, 2 to 02,...?如果这是原因，那么我该如何修复从 1 到 01、2 到 02、...的日期和月份列？

This is my first time working with timestamps and I am new to Scala so I greatly appreciate the help.这是我第一次使用时间戳，我是 Scala 的新手，因此非常感谢您的帮助。

Answer 1

You can specify one letter M , d and h only.您只能指定一个字母M 、 d和h 。 Spark will use that as a minimum number of digits that the field contains. Spark 将使用它作为字段包含的最小位数。 Note that your timestamp strings do not have seconds, so you should not include :ss .请注意，您的时间戳字符串没有秒数，因此您不应包含:ss 。

val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "M/d/yyyy h:mm a"))

See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for more details of datetime formatting.有关日期时间格式的更多详细信息，请参阅https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html 。 In particular:尤其是：

Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount.数字：对于格式化，模式字母的数量是最小位数，较短的数字在此数量上补零。 For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.对于解析，模式字母的数量将被忽略，除非需要分隔两个相邻的字段。

Spark Scala 从日期创建时间戳列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-29 18:04:48

Spark Scala 从日期创建时间戳列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-29 18:04:48

解决方案1
1 已采纳 2020-11-29 18:04:48