Retrieve array stored in the dataframe column for each row using scala apark

Question

Following dataframe belongs to me

+-------------------------------+-------------------------+
|value                          |feeling                  |
+-------------------------------+-------------------------+
|Sam got these marks            |[sad, sad, dissappointed ]|
|I got good marks               |[happy, excited, happy]   |
+-------------------------------+-------------------------+

I want to iterate through this dataframe and get the array of marks column per each row and use the marks array for some calculation method.

def calculationMethod(arrayValue : Array[String]) {
//get averege of words
}

output dataframe

  +-------------------------------+-----------------------------+--------------
    |value                          |feeling                   |average       |
    +-------------------------------+-----------------------------------------+
    |Sam got these marks            |[sad, sad, dissappointed ]|sad           |
    |I got good marks               |[happy, excited, happy]   |happy         |
    +-------------------------------+-----------------------------------------+

I am not sure how I can iterate through each row and get the array in the second column that can be passed into my written method. Also please note that the dataframe shown in the question is a stream dataframe.

EDIT 1

val calculateUDF = udf(calculationMethod _)
    val editedDataFrame = filteredDataFrame.withColumn("average", calculateUDF(col("feeling"))) 

def calculationMethod(emojiArray: Seq[String]) : DataFrame {
val existingSparkSession = SparkSession.builder().getOrCreate()
    import existingSparkSession.implicits._
    val df = emojiArray.toDF("feeling")
    val result = df.selectExpr(
      "feeling",
      "'U+' || trim('0' , string(hex(encode(feeling, 'utf-32')))) as unicode"
    )
    result
}

I'm getting the following error

Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported

Please note that initial dataframe mentioned in the question is a stream dataframe

EDIT 2

This should be the final dataframe that I am expecting

    +-------------------+--------------+-------------------------+
    |value              |feeling       |unicode                  |
    +-------------------+--------------+-------------------------+
    |Sam got these marks|[😀😆😁]     |[U+1F600 U+1F606 U+1F601]|
    |I got good marks   |[😄🙃]        | [U+1F604 U+1F643 ]      |
    +-------------------+---------------+-------------------------+

Answer 1

You can transform the arrays instead of using a UDF:

val df2 = df.withColumn(
    "unicode", 
    expr("transform(feeling, x -> 'U+' || ltrim('0' , string(hex(encode(x, 'utf-32')))))")
)

df2.show(false)
+-------------------+------------+---------------------------+
|value              |feeling     |unicode                    |
+-------------------+------------+---------------------------+
|Sam got these marks|[😀, 😆, 😁]|[U+1F600, U+1F606, U+1F601]|
|I got good marks   |[😄, 🙃]    |[U+1F604, U+1F643]         |
+-------------------+------------+---------------------------+

Retrieve array stored in the dataframe column for each row using scala apark

Question

1 answers

solution1
1 ACCPTED 2021-03-22 17:25:08

Retrieve array stored in the dataframe column for each row using scala apark

Question

1 answers

solution1 1 ACCPTED 2021-03-22 17:25:08

solution1
1 ACCPTED 2021-03-22 17:25:08