PySpark - Create a Dataframe from a dictionary with list of values for each key

Question

I've this type of dictionary:

{'xy': [['value1', 'value2'], ['value3', 'value4']],
 'yx': [['value5', 'value6'], ['value7', 'value8']]}

I would like to create a dataFrame pyspark in which I have 3 columns and 2 rows. Every key of the dict has a row. For example, first row:

First column: xy
Second column: ["value1", "value2"]
Third column: ["value3", "value4"]

What's the better way to do this? I'm only able to create 2 columns, in which there is a key and only one column with all the list but it's not my desired result.

Answer 1

This is your data dictionary:

data = {
    'xy': [['value1', 'value2'], ['value3', 'value4']],
    'yx': [['value5', 'value6'], ['value7', 'value8']]
}

You can just use a for loop:

df = spark.createDataFrame(
    [[k] + v for k, v in data.items()],
    schema=['col1', 'col2', 'col3']
)

df.show(10, False)
+----+----------------+----------------+
|col1|col2            |col3            |
+----+----------------+----------------+
|xy  |[value1, value2]|[value3, value4]|
|yx  |[value5, value6]|[value7, value8]|
+----+----------------+----------------+

PySpark - Create a Dataframe from a dictionary with list of values for each key

Question

1 answers

solution1
0 2022-09-23 09:32:53

PySpark - Create a Dataframe from a dictionary with list of values for each key

Question

1 answers

solution1 0 2022-09-23 09:32:53

solution1
0 2022-09-23 09:32:53