Here is my example json file:
{"data":"example1","data2":"example2","register":[{"name":"John","last_name":"Travolta","age":68},{"name":"Nicolas","last_name":"Cage","age":58}], "data3":"example3","data4":"example4"}
And I have a data schema similar to this (totally illustrative):
root
|-- register: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- age: long (nullable = true)
| | |-- last_name: string (nullable = true)
| | |-- name: string (nullable = true)
|-- data: string (nullable = true)
|-- data2: string (nullable = true)
|-- data3: string (nullable = true)
|-- data4: string (nullable = true)
What I want is to iterate inside this register, check if the name field is equal to eg John Travolta and create a new struct new_register (for example) with all the fields that are in the same index as the name.
I tried using some of spark's own functions, like filter, when, contains, but none of them gave me the desired result.
I also tried to implement a UDF, but I couldn't find a way to apply the function to the field I want.
How do I resolve the above problem?
First explode array field and access struct field with dot notation and filter required value.Here is the code.
df.printSchema()
df.show(10,False)
df1 = df.withColumn("new_struct",explode("register")).filter((col("new_struct.last_name") == 'Travolta') & (col("new_struct.name") == 'John'))
df1.show(10,False)
df1.printSchema()
root
|-- data: string (nullable = true)
|-- data2: string (nullable = true)
|-- data3: string (nullable = true)
|-- data4: string (nullable = true)
|-- register: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- age: long (nullable = true)
| | |-- last_name: string (nullable = true)
| | |-- name: string (nullable = true)
+--------+--------+--------+--------+-------------------------------------------+
|data |data2 |data3 |data4 |register |
+--------+--------+--------+--------+-------------------------------------------+
|example1|example2|example3|example4|[{68, Travolta, John}, {58, Cage, Nicolas}]|
+--------+--------+--------+--------+-------------------------------------------+
+--------+--------+--------+--------+-------------------------------------------+--------------------+
|data |data2 |data3 |data4 |register |new_struct |
+--------+--------+--------+--------+-------------------------------------------+--------------------+
|example1|example2|example3|example4|[{68, Travolta, John}, {58, Cage, Nicolas}]|{68, Travolta, John}|
+--------+--------+--------+--------+-------------------------------------------+--------------------+
root
|-- data: string (nullable = true)
|-- data2: string (nullable = true)
|-- data3: string (nullable = true)
|-- data4: string (nullable = true)
|-- register: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- age: long (nullable = true)
| | |-- last_name: string (nullable = true)
| | |-- name: string (nullable = true)
|-- new_struct: struct (nullable = true)
| |-- age: long (nullable = true)
| |-- last_name: string (nullable = true)
| |-- name: string (nullable = true)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.