简体   繁体   English

如何在scala spark中添加指定位数的前导零填充?

[英]How to add leading zero padding with the specified number of digits in scala spark?

I have data.txt file as below.我有如下的data.txt文件。

12, 345, 6789

Now, I want to perform leading zero padding with the specified number of digits in the specified field of the argument file or standard input.现在,我想在参数文件或标准输入的指定字段中使用指定的位数执行前导零填充。 The number of digits specified in the specified field of the argument file is 8 digits.自变量文件的指定字段中指定的位数为 8 位数。 What should I do?我该怎么办?

This is my code:这是我的代码:

import org.apache.spark.sql.types._  
import org.apache.spark.sql.types._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._

//Convert textfile to DF
val conf = new SparkConf().setAppName("ct").setMaster("local").set("spark.driver.allowMultipleContexts", "true")
val sc = SparkContext(conf)
val sparkSess = SparkSession.builder().appName("SparkSessionZipsExample").config(conf).getOrCreate()
val path = "data.txt"
val data = sc.textFile(path)
val colNum = data.first().split(",").size
var schemaString = "key"
for( i <- 1 to colNum - 1) {
 schemaString += " value" + i
}
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable=true))
val schema = StructType(fields)
val dfWithSchema = sparkSess.read.option("header", "false").schema(schema).csv(path)
dfWithSchema.show()

//add leading zero padding with the specified number of digits
//The number of digits specified in the specified field of the argument file is 8 digits
val df = dfWithSchema.withColumn("key", format_string("%08d", $"key")).show
val df2 = dfWithSchema.withColumn("value2", format_string("%08d", $"value2")).show

But the output result is incorrect.但输出结果不正确。

I want to have the desired output result as below.我想获得如下所需的输出结果。 Please help me.请帮我。

+---------+------+---------+
|key      |value1|value2   |
+---------+------+---------+
| 00000012|   345| 00006789|
+---------+------+---------+

You can use the build-in lpad function as shown below:您可以使用内置的lpad功能,如下所示:

import org.apache.spark.sql.functions.lpad

dfWithSchema.select(
  lpad($"key", 8, "0", 
  lpad($"value2", 8, "0"),
  $"value1"
).show

This will insert 0s in the front of the string for a maximum of 8 characters.这将在字符串的前面插入最多 8 个字符的 0。

Please refer here for details.详情请参阅此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM