How to split list of dictionary in one column into two columns in pyspark dataframe?

Question

I want to split the filteredaddress column of the spark dataframe above into two new columns that are Flag and Address:

customer_id|pincode|filteredaddress|                                                              Flag| Address
1000045801 |121005 |[{'flag':'0', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]

Can anyone please tell me how can I do it?

Answer 1

You can get the values from filteredaddress map column using the keys:

df2 = df.selectExpr(
    'customer_id', 'pincode',
    "filteredaddress['flag'] as flag", "filteredaddress['address'] as address"
)

Other ways to access map values are:

import pyspark.sql.functions as F

df.select(
    'customer_id', 'pincode',
    F.col('filteredaddress')['flag'],
    F.col('filteredaddress')['address']
)

# or, more simply

df.select(
    'customer_id', 'pincode',
    'filteredaddress.flag',
    'filteredaddress.address'
)

How to split list of dictionary in one column into two columns in pyspark dataframe?

Question

1 answers

solution1
1 2021-02-18 16:05:56

How to split list of dictionary in one column into two columns in pyspark dataframe?

Question

1 answers

solution1 1 2021-02-18 16:05:56

solution1
1 2021-02-18 16:05:56