I'm using spark 2.4.
I have an ArrayType(StringType()) column and a StringType() column in a spark dataframe. I need to find the position of the StringType() column in the ArrayType(StringType()) column.
Sample Input:
+---------------+---------+
|arrayCol |stringCol|
+---------------+---------+
|['a', 'b', 'c']|'b' |
+---------------+---------+
|['a', 'b', 'c']|'d' |
+---------------+---------+
Sample Output:
+---------------+---------+-----+
|arrayCol |stringCol|Index|
+---------------+---------+-----+
|['a', 'b', 'c']|'b' |2 |
+---------------+---------+-----+
|['a', 'b', 'c']|'d' |null |
+---------------+---------+-----+
I have tried array_position but it's not working and I'm getting "Column is not iterable" error.
I have also tried combining expr, transform, and array_position, but I'm wondering if there's a solution that doesn't need using expr .
Thanks :)
Try with expr
with array_position
function.
Example:
df.show()
#+---------+---------+
#| arrayCol|stringCol|
#+---------+---------+
#|[a, b, c]| b|
#|[a, b, c]| d|
#+---------+---------+
from pyspark.sql.functions import *
df.withColumn("Index",expr('if(array_position(arrayCol,stringCol)=0,null,array_position(arrayCol,stringCol))')).\
show()
#+---------+---------+-----+
#| arrayCol|stringCol|Index|
#+---------+---------+-----+
#|[a, b, c]| b| 2|
#|[a, b, c]| d| null|
#+---------+---------+-----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.