[英]Spark Scala creating timestamp column from date
I have a "Date" column that is a String in a Spark DF in this format 1/1/2000 12:53 AM, 1/1/2000 2:53 AM, 1/1/2000 5:53 AM, ... I am trying to create a new column that converts this column into a Unix Timestamp but getting a column full of null as my output.我有一个“日期”列,它是 Spark DF 中的字符串,格式为 1/1/2000 12:53 AM、1/1/2000 2:53 AM、1/1/2000 5:53 AM,.. . 我正在尝试创建一个新列,将该列转换为 Unix 时间戳,但得到一个充满 null 的列作为我的 output。 The line I am using to create this column is:
我用来创建此列的行是:
val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a")) val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a"))
I created the Date column by concatenating separate Month, Day, Year, and Time columns but the Month and Day columns have input data in the form of 1 instead of 01 for Month and Day.我通过连接单独的月、日、年和时间列来创建日期列,但月和日列的输入数据为 1 而不是月和日的 01。 Is this why I'm getting a null column back or is there another reason?
这就是为什么我要返回 null 列还是有其他原因? And if this is the reason then how do I fix the day and month columns from 1 to 01, 2 to 02,...?
如果这是原因,那么我该如何修复从 1 到 01、2 到 02、...的日期和月份列?
This is my first time working with timestamps and I am new to Scala so I greatly appreciate the help.这是我第一次使用时间戳,我是 Scala 的新手,因此非常感谢您的帮助。
You can specify one letter M
, d
and h
only.您只能指定一个字母
M
、 d
和h
。 Spark will use that as a minimum number of digits that the field contains. Spark 将使用它作为字段包含的最小位数。 Note that your timestamp strings do not have seconds, so you should not include
:ss
.请注意,您的时间戳字符串没有秒数,因此您不应包含
:ss
。
val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "M/d/yyyy h:mm a"))
See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for more details of datetime formatting.有关日期时间格式的更多详细信息,请参阅https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html 。 In particular:
尤其是:
Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount.
数字:对于格式化,模式字母的数量是最小位数,较短的数字在此数量上补零。 For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.
对于解析,模式字母的数量将被忽略,除非需要分隔两个相邻的字段。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.