简体   繁体   English

如何在 Spark (Scala) 中获取前导零的整数值

[英]How to get integer value with leading zero in Spark (Scala)

I have spark dataframe and and trying to add Year, Month and Day columns to it.我有 spark 数据框,并尝试向其中添加年、月和日列。 But the problem is after adding the YTD columns it does not keeps the leading zero with the date and month columns.但问题是在添加 YTD 列后,它不会保留日期和月份列的前导零。

val cityDF= Seq(("Delhi","India"),("Kolkata","India"),("Mumbai","India"),("Nairobi","Kenya"),("Colombo","Srilanka"),("Tibet","China")).toDF("City","Country")
val dateString = "2020-01-01"
val dateCol = org.apache.spark.sql.functions.to_date(lit(dateString))
val finaldf = cityDF.select($"*", year(dateCol).alias("Year"), month(dateCol).alias("Month"), dayofmonth(dateCol).alias("Day"))

输出截图

I want to keep the leading zero from the Month and Day columns but it is giving me result as 1 instead of 01.我想保留 Month 和 Day 列的前导零,但它给我的结果是 1 而不是 01。
As I am using year month date columns for the spark partition creation.因为我使用年月日期列来创建火花分区。 so I want to keep the leading zeros intact.所以我想保持前导零不变。 So my question is: How do I keep the leading zero in my dataframe columns.所以我的问题是:如何在数据框列中保留前导零。

Integer type can be converted to String type, where leading zeroes are possibe, with "format_string" function:整数类型可以转换为字符串类型,其中前导零是可能的,使用“format_string”函数:

val finaldf =
  cityDF
    .select($"*",
      year(dateCol).alias("Year"),
      format_string("%02d", month(dateCol)).alias("Month"),
      format_string("%02d", dayofmonth(dateCol)).alias("Day")
    )

Why not simply use date_format for that?为什么不简单地使用date_format呢?

val finaldf = cityDF.select(
                     $"*", 
                     year(dateCol).alias("Year"), 
                     date_format(dateCol, "MM").alias("Month"), 
                     date_format(dateCol, "dd").alias("Day")
              )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM