简体   繁体   English

如何在Scala中更新日期格式的列

[英]How to Update a column in Scala which is in Date Format

Need help in Spark Scala to write the code for below issue. 在Spark Scala中需要帮助以编写以下问题的代码。 I have a file having records like below. 我有一个文件,记录如下。

aaa|2019-07-11 02:15:50

bbb|2019-07-03 22:21:50

vvv|2019-07-03 19:30:40

bzx|2019-07-11 02:15:30

rrr|2019-06-24 01:29:10

mmm|2019-06-23 20:35:05

qqq|2019-07-12 08:10:15

eee|2019-07-11 01:49:30

iii|2019-06-23 22:31:45

I have split the file and took the 2nd column 我已分割文件并接第二列

val file = spark.read.format("csv").option("delimiter", """|""").load(pathOfDumpfile).toDF()  

now I need to add "0000-00-00 00:00:05" to all the values of the file (second column that is in date format) and save it as file like below 现在我需要将“ 0000-00-00 00:00:05”添加到文件的所有值(日期格式的第二列)中,并将其保存为以下文件

aaa|2019-07-11 02:15:55

bbb|2019-07-03 22:21:55

vvv|2019-07-03 19:30:45

bzx|2019-07-11 02:15:35

rrr|2019-06-24 01:29:15

mmm|2019-06-23 20:35:10

qqq|2019-07-12 08:10:20

eee|2019-07-11 01:49:35

iii|2019-06-23 22:31:50

Can anyone suggest me how I can add + 5 seconds to all the records in file/column. 谁能建议我如何向文件/列中的所有记录添加+ 5秒。

Will really be helpful. 确实会有所帮助。 After adding in Date time field, the second or minute should change , it shouldn't effect the date like its 2019-07-11 23:59:59 then even adding 1 to the second it will be 2019-07-12 00:00:00. 在日期时间字段中添加后,秒或分钟应更改,它不应像其2019-07-11 23:59:59那样影响日期,然后即使在秒上添加1也将是2019-07-12 00: 00:00。 I want to add but doesn't want to change the date so how I can do this the date shouldn't be changed only there should be a change in minute or second. 我想添加但不想更改日期,因此,我应该如何更改日期,仅在分钟或秒内进行更改。

you can you do by using unix_timestamp 你可以通过使用unix_timestamp来做

scala>  var dfv = Seq(("aaa","2019-07-11 23:59:59"),("bbb","2019-07-03 22:21:50"),("vvv","2019-07-03 19:30:40"),("bzx","2019-07-11 02:15:30"),("rrr","2019-06-24 01:29:10"),("mmm","2019-06-23 20:35:05"),("qqq","2019-07-12 08:10:15"),("eee","2019-07-11 01:49:30"),("iii","2019-06-23 22:31:45")).toDF("value","_date")

scala> dfv.show
+-----+-------------------+
|value|              _date|
+-----+-------------------+
|  aaa|2019-07-11 23:59:59|
|  bbb|2019-07-03 22:21:50|
|  vvv|2019-07-03 19:30:40|
|  bzx|2019-07-11 02:15:30|
|  rrr|2019-06-24 01:29:10|
|  mmm|2019-06-23 20:35:05|
|  qqq|2019-07-12 08:10:15|
|  eee|2019-07-11 01:49:30|
|  iii|2019-06-23 22:31:45|
+-----+-------------------+

scala> dfv.withColumn("_date_v1",when(date_format(from_unixtime(unix_timestamp($"_date")),"HH:mm:ss ")>"23:59:55",$"_date").otherwise(from_unixtime(unix_timestamp($"_date")+5,"yyyy-MM-dd HH:mm:ss"))).show
+-----+-------------------+-------------------+
|value|              _date|           _date_v1|
+-----+-------------------+-------------------+
|  aaa|2019-07-11 23:59:59|2019-07-11 23:59:59|
|  bbb|2019-07-03 22:21:50|2019-07-03 22:21:55|
|  vvv|2019-07-03 19:30:40|2019-07-03 19:30:45|
|  bzx|2019-07-11 02:15:30|2019-07-11 02:15:35|
|  rrr|2019-06-24 01:29:10|2019-06-24 01:29:15|
|  mmm|2019-06-23 20:35:05|2019-06-23 20:35:10|
|  qqq|2019-07-12 08:10:15|2019-07-12 08:10:20|
|  eee|2019-07-11 01:49:30|2019-07-11 01:49:35|
|  iii|2019-06-23 22:31:45|2019-06-23 22:31:50|
+-----+-------------------+-------------------+

let me know if you have any question related to the same. 让我知道您是否有与此相关的任何问题。

You can do this with the help of a custom udf, like: 您可以在自定义udf的帮助下完成此操作,例如:

import org.apache.spark.sql.functions.{col, udf}

val file = spark.read.format("csv").option("delimiter", """|""").load(pathOfDumpfile).toDF("a", "b")

val timeUDF = udf((x: java.sql.Timestamp) => new java.sql.Timestamp(time.getTime + 5000)) //getTime returns ms

file.select(col("a"), timeUDF(col("b")))
.write(...)

You can use INTERVAL syntax in spark. 您可以在spark中使用INTERVAL语法。

Using Dataframe: 使用数据框:

val foo = spark.sql(""" select current_timestamp as ts """)
foo.select($"ts", $"ts" + expr("INTERVAL 5 SECONDS") as "ts_plus").show(false)
+-----------------------+-----------------------+
|ts                     |ts_plus                |
+-----------------------+-----------------------+
|2019-09-16 10:33:17.626|2019-09-16 10:33:22.626|
+-----------------------+-----------------------+

Using Spark SQL: 使用Spark SQL:

foo.createOrReplaceTempView("fooView")
spark.sql(""" select ts, ts + INTERVAL 5 seconds from fooView""").show(false)
+-----------------------+------------------------------------------+
|ts                     |CAST(ts + interval 5 seconds AS TIMESTAMP)|
+-----------------------+------------------------------------------+
|2019-09-16 10:35:12.847|2019-09-16 10:35:17.847                   |
+-----------------------+------------------------------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM