简体   繁体   中英

Subtract values of columns from two different data frames in PySpark to find RMSE

I am not able to figure it out. I am trying to calculate the RMSE between test and prediction data.

test

col1    col2
 a        2 
 b        3

prediction

col1   col2
 a       4 
 b       5

I am trying to do this test(col2)-prediction(col2). That is

2-4 =-2
3-5 =-2

I tried

test.select("col2").subtract(prediction.select("col2"))

But I am not getting the required result. I am trying to obtain this result to find the RMSE. Is there a built in function in spark to find the RMSE?

Thank you.

它是一个连接和一个算术减法:

test.join(prediction, on="col1").withColumn("sub", test.col2-prediction.col2)

请在以下表达式中替换您的表名:

tab1.join(tab2).withColumn("Sub", tab2("T1")-tab1("T")).select("Sub").show() 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM