[英]Extracting value of columns in spark dataframe
I have a requirement , where I need to filter out rows from spark dataframe where value of a certain column (say "price") needs to be matched with values present in a scala map.The key of scala map is value of another column (say "id"). 我有一个要求,在这里我需要从Spark数据框中过滤出某些列(例如“价格”)的值需要与scala映射中存在的值匹配的行。scala映射的关键是另一列的值(说“ id”)。 My dataframe contains two columns : id and price. 我的数据框包含两列:id和price。 I need to filter out all the columns where price does not match the price mentioned in scala map. 我需要过滤出所有价格与scala地图中提到的价格不匹配的列。
My code resembles this: 我的代码类似于:
object obj1{
// This method returns value price for items as per their id
getPrice(id:String):String {
//lookup in a map and return the price
}
}
object Main{
val validIds = Seq[String]("1","2","3","4")
val filteredDf = baseDataframe.where(baseDataframe("id").in(validIDs.map(lit(_)): _*) &&
baseDataframe("price") === (obj1.getPrice(baseDataframe("id").toString())))
// But this line send string "id" to obj1.getPrice() function
// rather than value of id column
}
}
I am not able to pass value of id columns to function obj1.getPrice(). 我无法将id列的值传递给函数obj1.getPrice()。 Any suggestion how to achieve this? 任何建议如何实现这一目标?
Thanks, 谢谢,
You can write a udf to do this: 您可以编写udf来做到这一点:
val checkPrice(id: String, price: String) = validIds.exists(_ == id) && obj1.getPrice(id) == price
val checkPriceUdf = udf(checkPrice)
baseDataFrame.where(checkPriceUdf($"id", $"price"))
Or another solution is convert the Map
of id -> price to a data frame, and then do an inner join with baseDataFrame
on the id
and price
columns. 另一个解决方案是将id-> price的Map
转换为数据框,然后使用id
和price
列上的baseDataFrame
进行内部baseDataFrame
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.