简体   繁体   English

在Spark数据框中提取列的值

[英]Extracting value of columns in spark dataframe

I have a requirement , where I need to filter out rows from spark dataframe where value of a certain column (say "price") needs to be matched with values present in a scala map.The key of scala map is value of another column (say "id"). 我有一个要求,在这里我需要从Spark数据框中过滤出某些列(例如“价格”)的值需要与scala映射中存在的值匹配的行。scala映射的关键是另一列的值(说“ id”)。 My dataframe contains two columns : id and price. 我的数据框包含两列:id和price。 I need to filter out all the columns where price does not match the price mentioned in scala map. 我需要过滤出所有价格与scala地图中提到的价格不匹配的列。

My code resembles this: 我的代码类似于:

object obj1{
  // This method returns value price for items as per their id
  getPrice(id:String):String {
   //lookup in a map and return the price
  }
}

object Main{    
  val validIds = Seq[String]("1","2","3","4")
  val filteredDf = baseDataframe.where(baseDataframe("id").in(validIDs.map(lit(_)): _*) &&
    baseDataframe("price") === (obj1.getPrice(baseDataframe("id").toString()))) 

  // But this line send string "id" to obj1.getPrice() function
  // rather than value of id column
  }
}

I am not able to pass value of id columns to function obj1.getPrice(). 我无法将id列的值传递给函数obj1.getPrice()。 Any suggestion how to achieve this? 任何建议如何实现这一目标?

Thanks, 谢谢,

You can write a udf to do this: 您可以编写udf来做到这一点:

val checkPrice(id: String, price: String) = validIds.exists(_ == id) && obj1.getPrice(id) == price
val checkPriceUdf = udf(checkPrice)

baseDataFrame.where(checkPriceUdf($"id", $"price"))

Or another solution is convert the Map of id -> price to a data frame, and then do an inner join with baseDataFrame on the id and price columns. 另一个解决方案是将id-> priceMap转换为数据框,然后使用idprice列上的baseDataFrame进行内部baseDataFrame

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM