[英]Filter expected value from list in df column
I have a data frame with the following column:我有一个包含以下列的数据框:
raw_col
['a','b','c']
['b']
['a','b']
['c']
I want to return a column with single value based on a conditional statement.我想根据条件语句返回具有单个值的列。 I wrote the following function:
我写了以下function:
def filter_func(elements):
if "a" in elements:
return "a"
else:
return "Other"
When running the function on the column df.withColumn("col", filter_func("raw_col"))
I have the following error col should be Column
在
df.withColumn("col", filter_func("raw_col"))
列上运行 function 我有以下错误col should be Column
What's wrong here?这里有什么问题? What should I do?
我应该怎么办?
You can use array_contains
function:您可以使用
array_contains
function:
import pyspark.sql.functions as f
df = df.withColumn("col", f.when(f.array_contains("raw_col", f.lit("a")), f.lit("a")).otherwise(f.lit("Other")))
But if you have a complex logic and need necessary use the filter_func
, it's needed to create an UDF:但是如果你有一个复杂的逻辑并且需要使用
filter_func
,则需要创建一个 UDF:
@f.udf()
def filter_func(elements):
if "a" in elements:
return "a"
else:
return "Other"
df = df.withColumn("col", filter_func("raw_col"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.