简体   繁体   English

使用 Colum 将列表传递给 dataframe 中的 udf

[英]Pass list to udf in dataframe with Colum

I am building dataframe from hive table where i need to transform column based on multiple columns in dataframe, for that i built udf and passing kwargs however i doubt the order of the kwargs gets changed as the order is important.我正在从 hive 表中构建 dataframe,我需要根据 dataframe 中的多个列来转换列,因为我怀疑我构建 udf 和传递 kwar kwarg 的顺序很重要。 So i decided to use List but i am exploring how can we pass multiple columns as list from dataframe transformation.所以我决定使用列表,但我正在探索如何从 dataframe 转换中将多个列作为列表传递。

function: function:

val = ''
@udf(returnType = StringType())
def func(list):
   for i in list
      val = val + i
   return val

df = df.withColumn(new_col,func(df["col1"],df["col2"],df["col3"])
df.show()

the below dynamic column approach might solve your problem.下面的动态列方法可能会解决您的问题。

from pyspark.sql.functions import concat
# Creating an example DataFrame
values = [('A1',11,'A3','A4'),('B1',22,'B3','B4'),('C1',33,'C3','C4')]
df = spark.createDataFrame(values,['col1','col2','col3','col4'])
df.show()

'''
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
|  A1|  11|  A3|  A4|
|  B1|  22|  B3|  B4|
|  C1|  33|  C3|  C4|
+----+----+----+----+
'''
 
col_list = ['col1','col2']
df = df.withColumn('concatenated_cols2',concat(*col_list))
col_list = ['col1','col2','col3']
df = df.withColumn('concatenated_cols3',concat(*col_list))
col_list = ['col1','col2','col3','col4']
df = df.withColumn('concatenated_cols4',concat(*col_list))
df.show()

'''
+----+----+----+----+------------------+------------------+------------------+
|col1|col2|col3|col4|concatenated_cols2|concatenated_cols3|concatenated_cols4|
+----+----+----+----+------------------+------------------+------------------+
|  A1|  11|  A3|  A4|              A111|            A111A3|          A111A3A4|
|  B1|  22|  B3|  B4|              B122|            B122B3|          B122B3B4|
|  C1|  33|  C3|  C4|              C133|            C133C3|          C133C3C4|
+----+----+----+----+------------------+------------------+------------------+
'''

Thanks Smart_Coder.感谢 Smart_Coder。 and Sorry for the delay in getting back to you.很抱歉延迟回复您。 Let me give your the full requirement.让我给你的全部要求。 i will take the dataframe as you mentioned above an example.我将以上面提到的 dataframe 为例。 and i will take 3 columns as input(It should be dynamic but will take these fro now).我将采用 3 列作为输入(它应该是动态的,但现在将采用这些列)。 col1, col2, col3 are the input columns to function.columns values should move from right to left in case of null or empty values. col1、col2、col3 是 function.columns 的输入列。如果是 null 或空值,列值应从右向左移动。 extension of the requirement: then i need to check the count of characters in each value and take only specific no.要求的扩展:那么我需要检查每个值中的字符数,并且只取特定的编号。 of characters into that column and rest of the columns should go into next column, if still it is ore than specific number of characters then rest will go into next column. of characters into that column and rest of the columns should go into next column, if still it is ore than specific number of characters then rest will go into next column. However we need only 3 columns/elements as output.但是,我们只需要 3 列/元素作为 output。

col1 col2 col2
ASDF QWER NMVB
     QWER NMVB
ASD       NMVB

for suppose i need only 3 characters in each field max.
output will be:
col1 col2 col3
ASD  F    QWE
QWE  R   NMV
ASD  NMV

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark-将列表作为参数传递给UDF +迭代数据框列添加 - PySpark - Pass list as parameter to UDF + iterative dataframe column addition 将列表作为参数传递给 udf 方法 - Pass list to udf method as a parameter PySpark - 将列表作为参数传递给UDF - PySpark - Pass list as parameter to UDF 如何将DataFrame作为输入传递给Spark UDF? - How to pass DataFrame as input to Spark UDF? 如何将 DataFrame 作为输入传递给 Spark UDF? - How to pass DataFrame as input to Spark UDF? 更改DataFrame中的列DataType并将其传递到UDF-PySpark - Changing Columns DataTypes in a DataFrame and pass it into a UDF - PySpark 如何将列表传递给pyspark中的UserDefinedFunction(UDF) - How to pass list to UserDefinedFunction (UDF) in pyspark 如何将第二个数据帧的列传递到PySpark 1.6.1中的UDF - How to pass the column of a second dataframe into a UDF in PySpark 1.6.1 如何将Spark Dataframe列的每个值作为字符串传递给python UDF? - How to pass each value of Spark Dataframe column as string to python UDF? 在类似于sql like运算符的dataframe1列中找到dataframe2列,并使用pandas列出dataframe2的结果 - find dataframe2 colum in dataframe1 column similar to the sql like operator and list the result from dataframe2 using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM