I have a PySpark DataFrame consists of three columns, whose structure is as below.
In[1]: df.take(1)
Out[1]:
[Row(angle_est=-0.006815859163590619, rwsep_est=0.00019571401752467945, cost_est=34.33651951754235)]
What I want to do is to retrieve each value of the first column ( angle_est
), and pass it as parameter xMisallignment
to a defined function to set a particular property of a class object. The defined function is:
def setMisAllignment(self, xMisallignment):
if np.abs(xMisallignment) > 0.8:
warnings.warn('You might set misallignment angle too large.')
self.MisAllignment = xMisallignment
I am trying to select the first column and convert it into rdd, and apply the above function to a map() function, but it seems it does not work, the MisAllignment
did not change anyway.
df.select(df.angle_est).rdd.map(lambda row: model0.setMisAllignment(row))
In[2]: model0.MisAllignment
Out[2]: 0.00111511718224
Anyone has ideas to help me let that function work? Thanks in advance!
You can register your function as spark UDF something similar to follows:
spark.udf.register("misallign", setMisAllignment)
You can get many examples of creating and registering UDF's in this test suite: https://github.com/apache/spark/blob/master/sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java
Hope it answers your question
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.