Subtract values of columns from two different data frames in PySpark to find RMSE

Question

I am not able to figure it out. I am trying to calculate the RMSE between test and prediction data.

test

col1    col2
 a        2 
 b        3

prediction

col1   col2
 a       4 
 b       5

I am trying to do this test(col2)-prediction(col2). That is

2-4 =-2
3-5 =-2

I tried

test.select("col2").subtract(prediction.select("col2"))

But I am not getting the required result. I am trying to obtain this result to find the RMSE. Is there a built in function in spark to find the RMSE?

Thank you.

Answer 1

它是一个连接和一个算术减法：

test.join(prediction, on="col1").withColumn("sub", test.col2-prediction.col2)

Answer 2

请在以下表达式中替换您的表名：

tab1.join(tab2).withColumn("Sub", tab2("T1")-tab1("T")).select("Sub").show()

Subtract values of columns from two different data frames in PySpark to find RMSE

Question

2 answers

solution1
4 ACCPTED 2018-02-27 14:00:03

solution2
-1 2018-08-07 07:25:22

Subtract values of columns from two different data frames in PySpark to find RMSE

Question

2 answers

solution1 4 ACCPTED 2018-02-27 14:00:03

solution2 -1 2018-08-07 07:25:22

solution1
4 ACCPTED 2018-02-27 14:00:03

solution2
-1 2018-08-07 07:25:22