Following dataframe belongs to me
+-------------------------------+-------------------------+
|value |feeling |
+-------------------------------+-------------------------+
|Sam got these marks |[sad, sad, dissappointed ]|
|I got good marks |[happy, excited, happy] |
+-------------------------------+-------------------------+
I want to iterate through this dataframe and get the array of marks column per each row and use the marks array for some calculation method.
def calculationMethod(arrayValue : Array[String]) {
//get averege of words
}
output dataframe
+-------------------------------+-----------------------------+--------------
|value |feeling |average |
+-------------------------------+-----------------------------------------+
|Sam got these marks |[sad, sad, dissappointed ]|sad |
|I got good marks |[happy, excited, happy] |happy |
+-------------------------------+-----------------------------------------+
I am not sure how I can iterate through each row and get the array in the second column that can be passed into my written method. Also please note that the dataframe shown in the question is a stream dataframe.
EDIT 1
val calculateUDF = udf(calculationMethod _)
val editedDataFrame = filteredDataFrame.withColumn("average", calculateUDF(col("feeling")))
def calculationMethod(emojiArray: Seq[String]) : DataFrame {
val existingSparkSession = SparkSession.builder().getOrCreate()
import existingSparkSession.implicits._
val df = emojiArray.toDF("feeling")
val result = df.selectExpr(
"feeling",
"'U+' || trim('0' , string(hex(encode(feeling, 'utf-32')))) as unicode"
)
result
}
I'm getting the following error
Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported
Please note that initial dataframe mentioned in the question is a stream dataframe
EDIT 2
This should be the final dataframe that I am expecting
+-------------------+--------------+-------------------------+
|value |feeling |unicode |
+-------------------+--------------+-------------------------+
|Sam got these marks|[😀😆😁] |[U+1F600 U+1F606 U+1F601]|
|I got good marks |[😄🙃] | [U+1F604 U+1F643 ] |
+-------------------+---------------+-------------------------+
You can transform
the arrays instead of using a UDF:
val df2 = df.withColumn(
"unicode",
expr("transform(feeling, x -> 'U+' || ltrim('0' , string(hex(encode(x, 'utf-32')))))")
)
df2.show(false)
+-------------------+------------+---------------------------+
|value |feeling |unicode |
+-------------------+------------+---------------------------+
|Sam got these marks|[😀, 😆, 😁]|[U+1F600, U+1F606, U+1F601]|
|I got good marks |[😄, 🙃] |[U+1F604, U+1F643] |
+-------------------+------------+---------------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.