Pyspark SQL dataframe map with multiple data types

Question

I'm having a pyspark code in glue where I want to create a dataframe with map structure to be a combination of integer and string.

sample data:

{ "Candidates": [
    {
      "jobLevel": 6,
      "name": "Steven",
    },    {
      "jobLevel": 5,
      "name": "Abby",
    } ] }

Hence, I tried using the below code to create the map data type. But every time the integer data type jobLevel gets converted to string data type. Any suggestion to get this done by retaining the data type of the job level?

code used:

df = spark.sql("select Supervisor_name, 
           map('job_level', INT(job_level_name), 
          'name', employeeLogin) as Candidates 
     from dataset_1")

Answer 1

It is not possible for map values to have different types. Use a struct for this situation.

df = spark.sql("""
    select Supervisor_name, 
           struct(INT(job_level_name) as job_level, 
                  employeeLogin as name
                 ) as Candidates 
    from dataset_1
""")

Answer 2

I am new to pyspark:-). However, lets try parallelize and then define schema to desired;

js={ "Candidates": [
    {
      "jobLevel": 6,
      "name": "Steven",
    },    {
      "jobLevel": 5,
      "name": "Abby",
    } ] }



    from pyspark.sql.types import *
    df=sc.parallelize(js["Candidates"])
    schema = StructType([StructField('name', StringType(), True),
                         StructField('jobLevel', IntegerType(), True)])
    df1=spark.read.json(df, schema)
    df1.show(truncate=False)
    df1.printSchema()

I get:

+------+--------+
|name  |jobLevel|
+------+--------+
|Steven|6       |
|Abby  |5       |
+------+--------+

root
 |-- name: string (nullable = true)
 |-- jobLevel: integer (nullable = true)

Pyspark SQL dataframe map with multiple data types

Question

2 answers

solution1
1 ACCPTED 2021-04-14 08:35:01

solution2
0 2021-04-13 23:43:10

Pyspark SQL dataframe map with multiple data types

Question

2 answers

solution1 1 ACCPTED 2021-04-14 08:35:01

solution2 0 2021-04-13 23:43:10

solution1
1 ACCPTED 2021-04-14 08:35:01

solution2
0 2021-04-13 23:43:10