简体   繁体   English

Spark Scala 从日期创建时间戳列

[英]Spark Scala creating timestamp column from date

I have a "Date" column that is a String in a Spark DF in this format 1/1/2000 12:53 AM, 1/1/2000 2:53 AM, 1/1/2000 5:53 AM, ... I am trying to create a new column that converts this column into a Unix Timestamp but getting a column full of null as my output.我有一个“日期”列,它是 Spark DF 中的字符串,格式为 1/1/2000 12:53 AM、1/1/2000 2:53 AM、1/1/2000 5:53 AM,.. . 我正在尝试创建一个新列,将该列转换为 Unix 时间戳,但得到一个充满 null 的列作为我的 output。 The line I am using to create this column is:我用来创建此列的行是:

val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a")) val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a"))

I created the Date column by concatenating separate Month, Day, Year, and Time columns but the Month and Day columns have input data in the form of 1 instead of 01 for Month and Day.我通过连接单独的月、日、年和时间列来创建日期列,但月和日列的输入数据为 1 而不是月和日的 01。 Is this why I'm getting a null column back or is there another reason?这就是为什么我要返回 null 列还是有其他原因? And if this is the reason then how do I fix the day and month columns from 1 to 01, 2 to 02,...?如果这是原因,那么我该如何修复从 1 到 01、2 到 02、...的日期和月份列?

This is my first time working with timestamps and I am new to Scala so I greatly appreciate the help.这是我第一次使用时间戳,我是 Scala 的新手,因此非常感谢您的帮助。

You can specify one letter M , d and h only.您只能指定一个字母Mdh Spark will use that as a minimum number of digits that the field contains. Spark 将使用它作为字段包含的最小位数。 Note that your timestamp strings do not have seconds, so you should not include :ss .请注意,您的时间戳字符串没有秒数,因此您不应包含:ss

val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "M/d/yyyy h:mm a"))

See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for more details of datetime formatting.有关日期时间格式的更多详细信息,请参阅https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html In particular:尤其是:

Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount.数字:对于格式化,模式字母的数量是最小位数,较短的数字在此数量上补零。 For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.对于解析,模式字母的数量将被忽略,除非需要分隔两个相邻的字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 日期和时间列中的spark scala split timestamp列 - spark scala split timestamp column in date column and time column 将带有时区列的日期字符串转换为 spark scala 中的时间戳 - Convert the date string with timezone column to timestamp in spark scala 在spark scala中将时间戳列从UTC转换为EST - Convert timestamp column from UTC to EST in spark scala 在 Spark scala 中创建从时间戳值开始并以时间戳值递增一小时结束的 df - Creating a df starting from a timestamp value and ending on a timestamp value increment by one hour in Spark scala 将时间戳与 Spark 和 Scala 中的特定日期进行比较 - compare the timestamp with a specific date in Spark and Scala Spark Scala-时间戳记为date_add() - Spark Scala - timestamp into date_add() spark scala 比较具有时间戳列的数据帧 - spark scala compare dataframes having timestamp column CAST(unix_timestamp(Column,Format)) 使用 scala 在 Spark 中显示错误的日期 - CAST(unix_timestamp(Column,Format)) showing wrong date in Spark using scala Spark 2.3 (Scala) - 将时间戳列从 UTC 转换为另一列中指定的时区 - Spark 2.3 (Scala) - Convert a timestamp column from UTC to timezone specified in another column spark scala - 用于创建新列的 UDF 用法 - spark scala - UDF usage for creating new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM