[英]Scala spark + encoder issues
Working on a problem where I need to add a new column that holds the length of all characters under all columns.解决我需要添加一个新列的问题,该列包含所有列下所有字符的长度。
My sample data set :我的样本数据集:
ItemNumber,StoreNumber,SaleAmount,Quantity, Date
2231 , 1 , 400 , 2 , 19/01/2020
2145 , 3 , 500 , 10 , 14/01/2020
The expected output would be预期的输出将是
19 20
19 20
The ideal output am expecting to build is with new column Length added to the data frame我期望构建的理想输出是将新列Length添加到数据框中
ItemNumber,StoreNumber,SaleAmount,Quantity, Date , Length
2231 , 1 , 400 , 2 , 19/01/2020, 19
2145 , 3 , 500 , 10 , 14/01/2020, 20
My code我的代码
val spark = SparkSession.builder()
.appName("SimpleNewIntColumn").master("local").enableHiveSupport().getOrCreate()
val df = spark.read.option("header","true").csv("./data/sales.csv")
var schema = new StructType
df.schema.toList.map{
each => schema = schema.add(each)
}
val encoder = RowEncoder(schema)
val charLength = (row :Row) => {
var len :Int = 0
row.toSeq.map(x => {
x match {
case a : Int => len = len + a.toString.length
case a : String => len = len + a.length
}
})
len
}
df.map(row => charLength(row))(encoder) // ERROR - Required Encoder[Int] Found EncoderExpression[Row]
df.withColumn("Length", ?)
I have two issues我有两个问题
1) How to solve the error "ERROR - Required Encoder[Int] Found EncodeExpression[Row]"? 1) 如何解决错误“ERROR - Required Encoder[Int] Found EncodeExpression[Row]”?
2) How do I add the output of charLength function as new column value? 2) 如何将 charLength 函数的输出添加为新列值? - df.withColumn("Length", ?)
- df.withColumn("长度", ?)
Thank you.谢谢你。
Gurupraveen古鲁拉文
If you are just trying to add a column, with total length of that Row如果您只是想添加一列,该行的总长度
You can simply concat
all the columns cast
to String
and use length
function你可以简单地
concat
所有列cast
以String
和使用length
的功能
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.StringType
val concatCol = concat(df.columns.map(col(_).cast(StringType)):_*)
df.withColumn("Length", length(concatCol))
Output:输出:
+----------+-----------+----------+--------+----------+------+
|ItemNumber|StoreNumber|SaleAmount|Quantity| Date|length|
+----------+-----------+----------+--------+----------+------+
| 2231| 1| 400| 2|19/01/2020| 19|
| 2145| 3| 500| 10|14/01/2020| 20|
+----------+-----------+----------+--------+----------+------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.