简体   繁体   English

如何将参数传递给不带字符串的函数(Pyspark)

[英]How to pass an argument to a function that doesn't take string (Pyspark)

I have the join function in Spark SQL. 我在Spark SQL中具有联接功能。 This function needs a join condition and if the columns that we are joining on do not have the same name, they need to be passed as a join expression. 该函数需要一个连接条件,并且如果我们要连接的列的名称不同,则需要将它们作为连接表达式传递。

Example: 例:

x.join(y, x.column1 == y.column2)

This means that we are joining dataframes x and y on column1 in x and column2 in y 这意味着我们要在x column1y column2上连接数据帧xy

I would like to write a function that takes the column name for both dataframes as an argument and joins on those columns. 我想编写一个函数,该函数将两个数据框的列名都作为参数并加入这些列。 The problem is that the join expression cannot be a string. 问题是联接表达式不能是字符串。 I have looked at questions like this one where a map is used to map a variable name however this does not fit my needs. 我已经看过像这样的问题, 一个地方的地图是用来映射一个变量名然而,这不符合我的需求。 I need to remove the quotation marks that make the column name a string and pass them to the join function. 我需要删除使列名成为字符串的引号并将它们传递给join函数。

I have checked and there is no other way to do this in Pyspark if the columns that we are joining on do not have the same name (besides generating a copy of one of the dataframes with new columns names. This is because dataframes are immutable and column names cannot be changed) 我已经检查过了,如果我们要连接的列不具有相同的名称(除了生成具有新列名称的数据框之一的副本之外,Pyspark中没有其他方法可以这样做),这是因为数据框是不可变的,并且列名称不能更改)

Is there any other way to pass the column names into the join expression? 还有其他方法可以将列名传递给联接表达式吗?

Re posting my comment as an answer for future reference. 重新发布我的评论作为答案,以供将来参考。 You can get any attribute of a class or module using the gettatr function. 您可以使用gettatr函数获取类或模块的任何属性。

x.join(y, getattr(x, 'column1') == getattr(y, 'column2'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM