[英]How to parse/convert a string column in dataframe to datetime column with Scala
My current data is in this format: 2013-07-25 00:00:00.0
,我目前的数据是这种格式:
2013-07-25 00:00:00.0
,
orders.take(10).foreach(println)
1,2013-07-25 00:00:00.0,11599,CLOSED
2,2012-07-25 00:00:00.0,256,PENDING_PAYMENT
3,2011-07-25 00:00:00.0,12111,COMPLETE
4,2014-07-25 00:00:00.0,8827,CLOSED
5,2015-07-25 00:00:00.0,11318,COMPLETE
6,2016-07-25 00:00:00.0,7130,COMPLETE
7,2017-07-25 00:00:00.0,4530,COMPLETE
8,2018-07-25 00:00:00.0,2911,PROCESSING
9,2019-07-25 00:00:00.0,5657,PENDING_PAYMENT
10,2009-07-25 00:00:00.0,5648,PENDING_PAYMENT
I know how to convert the string to int:我知道如何将字符串转换为 int:
val ordersMap = orders.map(a=>(
a.split(",")(0).toInt,
a.split(",")(1),
a.split(",")(2).toInt,
a.split(",")(3)
))
But, for the second column date in string format, I am looking for a easy way like .toInt
, all I want is to parse it into a datetime.但是,对于字符串格式的第二列日期,我正在寻找一种简单的方法,例如
.toInt
,我想要的只是将其解析为日期时间。
I wonder if there is a simple way to do that on all the rows in the dataframe, and if there is a flexible way to accommodate different datetime formats, like yyyy/mm/dd
, mm/dd/yyyy
, dd/mm/yyyy
, etc.我想知道是否有一种简单的方法可以在数据框中的所有行上执行此操作,并且是否有一种灵活的方法来适应不同的日期时间格式,例如
yyyy/mm/dd
、 mm/dd/yyyy
、 dd/mm/yyyy
, 等等。
Thank you.谢谢你。
[UPDATE1] Thanks to @smac89's suggestion, I tried with no luck, screenshot is here: [UPDATE1]感谢@smac89 的建议,我尝试了没有运气,截图在这里:
You can just do LocalDate.parse
as in the duplicate, but AFAIK there is no such extension for dates.您可以像重复一样执行
LocalDate.parse
,但是AFAIK没有这样的日期扩展名。 You can easily create your own though:您可以轻松创建自己的:
implicit class StringDates(ds: String) {
def toLocalDate: LocalDate = ds.toLocalDate(DateTimeFormatter.ISO_LOCAL_DATE)
def toLocalDate(fmt: DateTimeFormatter): LocalDate = LocalDate.parse(ds, fmt)
}
Now you can do:现在你可以这样做:
"2013-07-25".toLocalDate
Or pass in a formatter by doing:或者通过执行以下操作传递格式化程序:
"2013-07-25".toLocalDate(fmt)
Try it on Scastie 1在 Scastie 1 上试试
Try it on Scastie 2在 Scastie 2 上试试
You can create more formatters easily by doing:您可以通过执行以下操作轻松创建更多格式化程序:
DateTimeFormatter.ofPattern("yyyy/mm/dd")
Here is what I ended with, cumbersome but working:这是我结束的内容,繁琐但有效:
import java.time._
import java.time.format.DateTimeFormatter
import org.apache.spark.sql.functions._
...... ......
val datetime_format = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val test="2013-07-25 00:00:..."
val myd = test.substring(0,10).format(datetime_format)
val mydate = datetime_format.parse(myd)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.