简体   繁体   English

如何使用列值作为 PySpark 中字典的键?

[英]How to use a column value as key to a dictionary in PySpark?

I have a small PySpark DataFrame df :我有一个小的 PySpark DataFrame df

index    col1
0        1    
1        3
2        4

And a dictionary:还有一本字典:

LOOKUP = {0: 2, 1: 5, 2: 5, 3: 4, 4: 6}

I now want to add an extra column col2 to df , equal to the LOOKUP values of col1 .我现在想向df添加一个额外的列col2 ,等于col1LOOKUP值。

My output should look like this:我的 output 应该是这样的:

index    col1 col2
0        1    5    
1        3    4
2        4    6

I tried using:我尝试使用:

df = df.withColumn(col("col2"), LOOKUP[col("col1")])

But this gave me errors, as well as using expr .但这给了我错误,以及使用expr

How to achieve this in PySpark?如何在 PySpark 中实现这一点?

You can use a map column that you create from the lookup dictionary:您可以使用从lookup字典创建的map列:

from itertools import chain
from pyspark.sql import functions as F

lookup = {0: 2, 1: 5, 2: 5, 3: 4, 4: 6}
lookup_map = F.create_map(*[F.lit(x) for x in chain(*lookup.items())])

df1 = df.withColumn("col2", lookup_map[F.col("col1")])

df1.show()
#+-----+----+----+
#|index|col1|col2|
#+-----+----+----+
#|    0|   1|   5|
#|    1|   3|   4|
#|    2|   4|   6|
#+-----+----+----+

Another way would be to create a lookup_df from the dict then join with your dataframe另一种方法是从dict创建一个lookup_df ,然后加入你的dataframe

You van use a CASE WHEN statement with python f-strings here with the LOOKUP dictionary:您可以在此处使用带有 python f-stringsCASE WHEN语句和LOOKUP字典:

from pyspark.sql import functions as F
column = 'col1' #column to replace
e = f"""CASE {' '.join([f"WHEN {column}='{k}' THEN '{v}'" for k,v in LOOKUP.items()])} 
        ELSE NULL END"""
out = df.withColumn("col2",F.expr(e))

out.show()

+-----+----+----+
|index|col1|col2|
+-----+----+----+
|    0|   1|   5|
|    1|   3|   4|
|    2|   4|   6|
+-----+----+----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何按列对 pyspark 中的数据框进行分组,并获取以该列为键和记录列表为其值的字典? - How to groupby a data frame in pyspark by a column and get a dictionary with that column as key and list of records as its value? 如何在Pyspark的Dictionary中转换Dataframe Column1:Column2(key:value)? - How can I convert Dataframe Column1:Column2 (key:value) in Dictionary in Pyspark? Pandas - 使用列中的值作为单独字典中的键 - Pandas - Use Value in Column as key in separate dictionary Pyspark - 如何在键 AND 值上使用广播字典过滤 RDD - Pyspark - how to filter RDD with Broadcast Dictionary on key AND value 如何grep列并将其添加到字典键中作为值 - How to grep a column and add it in dictionary key as value 如何将字典的键和值转换为数据框列? - How to convert key and value of dictionary to a dataframe column? 在Pyspark中,如何通过字典将列映射到其他值(字典的键是元组) - In Pyspark, how to map a column to other values via dictionary (key of dictionary are tuple) 在 PySpark 中检查字典中键的值是否为空 - Checking if the value of a key in a dictionary is empty in PySpark 如何使用任何值作为字典键? - How to use any value as a dictionary key? 如何在Python字典中使用键/值对 - How to use key/value pairs in a Python dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM