I want to split the filteredaddress column of the spark dataframe above into two new columns that are Flag and Address:
customer_id|pincode|filteredaddress| Flag| Address
1000045801 |121005 |[{'flag':'0', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
Can anyone please tell me how can I do it?
You can get the values from filteredaddress
map column using the keys:
df2 = df.selectExpr(
'customer_id', 'pincode',
"filteredaddress['flag'] as flag", "filteredaddress['address'] as address"
)
Other ways to access map values are:
import pyspark.sql.functions as F
df.select(
'customer_id', 'pincode',
F.col('filteredaddress')['flag'],
F.col('filteredaddress')['address']
)
# or, more simply
df.select(
'customer_id', 'pincode',
'filteredaddress.flag',
'filteredaddress.address'
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.