![](/img/trans.png)
[英]Spark scala how to convert a Integer column in dataframe to hex uppercase string?
[英]Create New Column with range of integer by using existing Integer Column in Spark Scala Dataframe
假設我有一個 Spark Scala DataFrame object,例如:
+--------+
|col1 |
+--------+
|1 |
|3 |
+--------+
我想要一個 DataFrame 像:
+-----------------+
|col1 |col2 |
+-----------------+
|1 |[0,1] |
|3 |[0,1,2,3] |
+-----------------+
Spark 提供了大量的 APIs/Functions 來玩,大多數時候默認函數很方便,但是對於特定的任務 UserDefinedFunctions UDFs 可以被編寫。
參考https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-udfs.html
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.col
import spark.implicits._
val df = spark.sparkContext.parallelize(Seq(1,3)).toDF("index")
val rangeDF = df.withColumn("range", indexToRange(col("index")))
rangeDF.show(10)
def indexToRange: UserDefinedFunction = udf((index: Integer) => for (i <- 0 to index) yield i)
You can achieve it with the below approach
val input_df = spark.sparkContext.parallelize(List(1, 2, 3, 4, 5)).toDF("col1")
input_df.show(false)
Input:
+----+
|col1|
+----+
|1 |
|2 |
|3 |
|4 |
|5 |
+----+
val output_df = input_df.rdd.map(x => x(0).toString()).map(x => (x, Range(0, x.toInt + 1).mkString(","))).toDF("col1", "col2")
output_df.withColumn("col2", split($"col2", ",")).show(false)
Output:
+----+------------------+
|col1|col2 |
+----+------------------+
|1 |[0, 1] |
|2 |[0, 1, 2] |
|3 |[0, 1, 2, 3] |
|4 |[0, 1, 2, 3, 4] |
|5 |[0, 1, 2, 3, 4, 5]|
+----+------------------+
希望這可以幫助!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.