簡體   English   中英

Pyspark - 如何僅將 function 應用於 DataFrame 中的列子集?

[英]Pyspark - How to apply a function only to a subset of columns in a DataFrame?

我想用不同的方法將 function 應用於Spark DataFrame 的某些列: fnfn1 我是這樣做的:

def fn(column):
    return(x*2)

udf_1 = udf(fn, DecimalType())

def fn1(column):
    return(x*3)

udf_2 = udf(fn1, DecimalType())
    
def process_df1(df, col_name):
    df1 = df.withColumn(col_name, udf_1(col_name))
    return df1

def process_df2(df, col_name):
    df2 = df.withColumn(col_name, udf_2(col_name))
    return df2

對於單個列,它工作正常。 但是現在我得到了一個dict list ,其中包含有關各個列的信息:

cols_info = [{'col_name': 'metric_1', 'process': 'True', 'method':'simple'}, {'col_name': 'metric_2', 'process': 'False', 'method':'hash'}] 

我應該如何解析cols_info列表並將上述邏輯僅應用於具有process:True並使用所需method的列?

首先想到的是用process:False過濾掉列

list(filter(lambda col_info: col_info['process'] == 'True', cols_info))

但是我在這里仍然缺少一種更通用的方法。

selectExpr function 在這里會有用

import pyspark.sql.functions as F
from pyspark.sql.window import Window
#Test data
tst = sqlContext.createDataFrame([(1,2,3,4),(1,3,4,1),(1,4,5,5),(1,6,7,8),(2,1,9,2),(2,2,9,9)],schema=['col1','col2','col3','col4'])    

def fn(x):
    return(x*2)

def fn1(x):
    return(x*3)

sqlContext.udf.register("fn1", fn)
sqlContext.udf.register("fn2", fn1)

cols_info =[{'col_name':'col1','encrypt':False,},{'col_name':'col2','encrypt':True,'method':'fn1'},{'col_name':'col3','encrypt':True,'method':'fn2'}]
# determine which columns have any of the encryption
modified_columns = [x['col_name'] for x in cols_info if x['encrypt']]
# select which colulmns have to be retained
columns_retain = list(set(tst.columns)-set(modified_columns))
#%
expr =columns_retain+[((x['method'])+'('+(x['col_name'])+') as '+ x['col_name']) for x in cols_info if x['encrypt']]
#%
tst_res = tst.selectExpr(*expr)

結果將是:

+----+----+----+----+
|col4|col1|col2|col3|
+----+----+----+----+
|   4|   1|   4|   9|
|   1|   1|   6|  12|
|   5|   1|   8|  15|
|   8|   1|  12|  21|
|   2|   2|   2|  27|
|   9|   2|   4|  27|
+----+----+----+----+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM