简体   繁体   中英

How to map values by column names at pyspark

What do i want - is to map column names into keys. For example:

#+-------+----------+
#|key1   |key2      |
#+-------+----------+
#|value1 |value2    |
#|value3 |value4    |
#+-------+----------+

Will be transformed to

#+-------+----------+
#|   keys|values    |
#+-------+----------+
#|key1   |value1    |
#|key1   |value2    |
#|key2   |value3    |
#|key2   |value4    |
#+-------+----------+

In HiveQL i can wrote something similar to

select distinct key, velue
    from xxx
    lateral view explode(map(
            'key1', key1,
            'key2', key2) tab as key, value

But how to wrote it at pyspark. I can use createtemptable but i think it's not best solution/

Use create_map function to create a Map column and then explode it.

create_map needs a list of column expressions that are grouped as key-value pairs. Such a list can be created using for comprehension on DataFrame columns:

from itertools import chain
from pyspark.sql.functions import col, lit, create_map, explode


data = [("value1", "value2"), ("value3", "value4")]
df = spark.createDataFrame(data, ["key1", "key2"])

key_values = create_map(*(chain(*[(lit(name), col(name)) for name in df.columns])))

df.select(explode(key_values)).show()

+----+------+
| key| value|
+----+------+
|key1|value1|
|key2|value2|
|key1|value3|
|key2|value4|
+----+------+

Something like that ?

select 'key1' as keys,
       key1 as values
from xxx
union all 
select 'key2' as keys,
       key2 as values
from xxx

put that is spark.sql() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM