![](/img/trans.png)
[英]In Apache Spark, I have a dataframe with one column which has string (its a date) but leading zero is missing from month and day
[英]Spark dataframe string to month
我有一个数据orddate
,其中一列orddate
作为字符串,我想从orddate
提取月份,并在新df上创建一个具有month
名称的新列。
|orddate|
|12/1/10 9:37|
|20/3/10 10:37|
|09/8/14 4:56|
|30/12/11 12:13|
|24/5/10 7:27|
转换成
|orddate| month |
|12/1/10 9:37| january|
|20/3/10 10:37| march |
|09/8/14 4:56| august |
|30/12/11 12:13| december |
|24/5/10 7:27| may |
1)使用格式为dd/MM/yy hh:mm
unix_timestamp
将列转换为时间戳; 2)使用from_unixtime
与格式MMMMM
的时间戳转换为month
;
您可以在此处查看有关格式的更多信息。
import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp}
df.withColumn("month", from_unixtime(unix_timestamp($"orddate", "dd/MM/yy hh:mm"), "MMMMM")).show
+--------------+--------+
| orddate| month|
+--------------+--------+
| 12/1/10 9:37| January|
| 20/3/10 10:37| March|
| 09/8/14 4:56| August|
|30/12/11 12:13|December|
| 24/5/10 7:27| May|
+--------------+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.